Prometheus is an outstanding metrics engine, but it was never designed to be a long-term storage system. Out of the box, the kube-prometheus-stack retains metrics on local disk for a default window of 10 to 15 days. For most production teams, that is not enough — capacity planning requires months of historical data, compliance demands years, and multi-cluster visibility requires a way to query metrics across independent Prometheus instances. Thanos solves every one of these problems.

This guide walks through the complete integration of Thanos with kube-prometheus-stack, from enabling the Thanos sidecar in your Helm values.yaml to deploying a production-grade multi-cluster monitoring architecture with object storage backends. Every configuration example is drawn from real-world deployments and tested against the latest chart versions.

Why Prometheus Needs Long-Term Storage

Prometheus stores metrics in a local time-series database (TSDB) on disk. Each Prometheus instance manages its own data independently, using a write-ahead log (WAL) and compacting data into blocks every two hours. This local storage model is elegant and fast, but it creates several hard limitations that every production team eventually hits.

Retention is bounded by disk size. The default retention in kube-prometheus-stack is 10 days. You can increase it via prometheus.prometheusSpec.retention in your values.yaml, but you are always trading disk cost against retention window. A production cluster emitting 500,000 active time series at a 15-second scrape interval generates roughly 5 GB of TSDB data per day. Storing 90 days of that requires 450 GB of persistent volume — expensive on cloud providers and operationally fragile because a full disk crashes Prometheus entirely.

No global query view. When you run separate Prometheus instances per cluster (the standard high-availability pattern), each instance only sees its own metrics. There is no built-in mechanism to query "total CPU usage across all clusters" or compare application performance between staging and production. Teams end up building manual aggregation dashboards in Grafana with multiple data sources, which is brittle and does not scale.

No downsampling. Prometheus stores every scraped sample at full resolution forever (within its retention window). For metrics older than a few days, you rarely need 15-second granularity — 5-minute or 1-hour averages suffice. Without downsampling, old data costs the same to store and query as new data, which is wasteful.

Backup and disaster recovery are painful. Prometheus TSDB was not designed to be backed up while running. Snapshotting the data directory risks corruption, and restoring from a snapshot on a different node requires careful block management. Teams that rely on Prometheus as their only source of metrics data are one node failure away from losing everything.

What is Thanos and How Does It Work?

Thanos is a CNCF Incubating project that extends Prometheus with long-term storage, global query capabilities, and downsampling — without replacing Prometheus itself. The key design principle of Thanos is that it treats Prometheus as the collection engine and adds a durable storage layer on top using cheap object storage (S3, GCS, Azure Blob Storage).

Thanos works by deploying a sidecar container alongside each Prometheus instance. This sidecar does two critical things: first, it continuously uploads completed TSDB blocks (two-hour chunks of metrics data) to an object storage bucket; second, it exposes a gRPC StoreAPI that allows other Thanos components to query real-time data directly from the Prometheus instance. The result is that your Prometheus instances remain lightweight and short-lived (keeping only recent data on disk), while historical data lives permanently in object storage at a fraction of the cost.

On the query side, Thanos provides a Query component (sometimes called Thanos Querier) that implements the full Prometheus HTTP API. It fans out queries to all registered StoreAPI endpoints — sidecars for recent data, Store Gateways for historical data in object storage — and merges the results. From the perspective of Grafana or any PromQL client, Thanos Query looks exactly like a regular Prometheus server, but it can answer queries spanning months or years of data across multiple clusters.

Thanos Architecture Overview

A production Thanos deployment consists of several components, each handling a specific responsibility. Understanding how they interact is essential before writing any configuration.

Thanos Sidecar

Runs as a container alongside each Prometheus pod in your kube-prometheus-stack deployment. It uploads completed TSDB blocks to object storage every two hours and serves real-time queries via the StoreAPI. The sidecar is the entry point for integrating Thanos — it requires no changes to Prometheus itself, only a sidecar container specification and object storage credentials.

Thanos Store Gateway

A stateless component that reads TSDB blocks from object storage and serves them via the StoreAPI. It maintains a local cache of block metadata (and optionally index data) to minimize object storage API calls. When Thanos Query needs historical data older than what is in Prometheus local storage, it queries the Store Gateway.

Thanos Query (Querier)

The central query aggregation layer. It discovers all StoreAPI endpoints (sidecars, Store Gateways, other Queriers, Receive nodes) and fans out incoming PromQL queries to the appropriate stores. It handles deduplication of metrics from high-availability Prometheus pairs and provides a unified Prometheus-compatible API. This is the component you point Grafana at.

Thanos Compactor

Runs as a singleton (only one instance globally) and operates exclusively on object storage data. It performs two operations: compaction (merging small blocks into larger ones to improve query performance) and downsampling (creating 5-minute and 1-hour resolution copies of data older than configurable thresholds). Compactor is critical for keeping object storage costs manageable over time.

Thanos Ruler (Optional)

Evaluates Prometheus recording and alerting rules against Thanos Query instead of local Prometheus data. This is useful when rules need to operate on data spanning multiple clusters or long time ranges that exceed local Prometheus retention. Most deployments start without Ruler and add it later when cross-cluster alerting becomes a requirement.

Enabling Thanos Sidecar in Kube-Prometheus-Stack

The kube-prometheus-stack Helm chart has first-class support for the Thanos sidecar. Enabling it requires adding a few sections to your values.yaml. Here is the minimal configuration to get the sidecar running:

values.yaml — Thanos Sidecar Configuration
prometheus:
  prometheusSpec:
    # Reduce local retention since Thanos handles long-term storage
    retention: 2d
    retentionSize: 10GB

    # Enable the Thanos sidecar container
    thanos:
      enabled: true
      image: quay.io/thanos/thanos:v0.36.1
      version: v0.36.1

      # Object storage configuration (referenced as a Kubernetes Secret)
      objectStorageConfig:
        existingSecret:
          name: thanos-objstore-secret
          key: objstore.yml

    # External labels identify this Prometheus in the global view
    externalLabels:
      cluster: production-us-east-1
      environment: production

    # Required: disable compaction — Thanos Compactor handles this
    disableCompaction: true

  # Expose the Thanos sidecar gRPC port as a Service
  thanosService:
    enabled: true
    type: ClusterIP
    port: 10901
    targetPort: grpc

  # Optional: expose as a headless service for Thanos Query discovery
  thanosServiceExternal:
    enabled: false

There are several important details in this configuration that deserve explanation.

External labels are mandatory. Thanos uses externalLabels to distinguish metrics from different Prometheus instances and clusters. Every Prometheus instance that uploads data to the same object storage bucket must have a unique set of external labels. Without external labels, Thanos cannot deduplicate data from HA pairs or separate metrics from different clusters in query results. The cluster and environment labels shown above are a common pattern, but you can use any label names that make sense for your organization.

Disable compaction on Prometheus. Setting disableCompaction: true prevents Prometheus from compacting its local TSDB blocks. This is critical because the Thanos sidecar needs to upload complete, uncompacted blocks to object storage. If Prometheus compacts blocks locally, the sidecar may miss data or upload partially compacted blocks that conflict with what Thanos Compactor produces. The Thanos Compactor component takes over all compaction duties on the object storage side.

Reduced local retention. With Thanos handling long-term storage, you can aggressively reduce Prometheus local retention to 2 days or even less. This keeps your Prometheus PVCs small and cheap. The sidecar uploads completed blocks (every 2 hours), so even with a 2-day local retention, you have a comfortable buffer. Data older than the local retention window is served by the Store Gateway from object storage.

Configuring Object Storage (S3, GCS, Azure Blob)

The Thanos sidecar and Store Gateway need credentials and configuration to access your object storage bucket. This is provided via a YAML configuration file, typically stored as a Kubernetes Secret. Here are the configurations for each major cloud provider.

Amazon S3 Configuration

objstore-s3.yml
type: S3
config:
  bucket: my-thanos-metrics
  endpoint: s3.us-east-1.amazonaws.com
  region: us-east-1
  # Use IRSA (IAM Roles for Service Accounts) instead of static keys
  # access_key and secret_key are omitted when using IRSA
  insecure: false
  signature_version2: false
  http_config:
    idle_conn_timeout: 1m30s
    response_header_timeout: 2m
    insecure_skip_verify: false
  part_size: 134217728  # 128MB multipart upload threshold
  sse_config:
    type: SSE-S3  # Server-side encryption

Google Cloud Storage Configuration

objstore-gcs.yml
type: GCS
config:
  bucket: my-thanos-metrics
  # Use Workload Identity instead of a service account JSON key
  # When using Workload Identity, omit service_account field
  http_config:
    idle_conn_timeout: 1m30s
    response_header_timeout: 2m

Azure Blob Storage Configuration

objstore-azure.yml
type: AZURE
config:
  storage_account: mythanosstorageaccount
  storage_account_key: ""  # Omit if using Managed Identity
  container: thanos-metrics
  endpoint: blob.core.windows.net
  max_retries: 3
  http_config:
    idle_conn_timeout: 1m30s
    response_header_timeout: 2m

Create the Kubernetes Secret from your object storage configuration file:

Terminal — Create Object Storage Secret
kubectl create secret generic thanos-objstore-secret \
  --from-file=objstore.yml=./objstore-s3.yml \
  -n monitoring

Security best practices for object storage credentials. On AWS, use IAM Roles for Service Accounts (IRSA) instead of static access keys. On GCP, use Workload Identity. On Azure, use Managed Identity. These approaches eliminate long-lived credentials entirely. The Thanos sidecar and Store Gateway pods assume the IAM role or identity attached to their Kubernetes service account, requiring no credentials in the object storage YAML. If you must use static credentials (e.g., on-premises MinIO), rotate them regularly and use Kubernetes Secrets with strict RBAC.

Deploying Thanos Query and Store Gateway

With the sidecar uploading data to object storage, you need two more components to query that data: the Store Gateway (to read from object storage) and Thanos Query (to aggregate and expose a unified API). Most teams deploy these using the bitnami/thanos Helm chart or plain Kubernetes manifests.

Thanos Store Gateway Deployment

thanos-store-gateway.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store-gateway
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-store-gateway
  template:
    metadata:
      labels:
        app: thanos-store-gateway
    spec:
      containers:
        - name: thanos-store
          image: quay.io/thanos/thanos:v0.36.1
          args:
            - store
            - --data-dir=/var/thanos/store
            - --objstore.config-file=/etc/thanos/objstore.yml
            - --index-cache-size=500MB
            - --chunk-pool-size=2GB
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902
          ports:
            - { name: grpc, containerPort: 10901 }
            - { name: http, containerPort: 10902 }
          volumeMounts:
            - name: objstore-config
              mountPath: /etc/thanos
            - name: data
              mountPath: /var/thanos/store
          resources:
            requests:
              cpu: 500m
              memory: 2Gi
            limits:
              memory: 4Gi
      volumes:
        - name: objstore-config
          secret:
            secretName: thanos-objstore-secret
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        resources:
          requests:
            storage: 20Gi

The Store Gateway is deployed as a StatefulSet because it benefits from stable storage for caching block index data. The index-cache-size and chunk-pool-size parameters control how much memory the Store Gateway uses for caching — larger caches mean fewer object storage API calls and faster queries, but require more memory. Start with the values above and tune based on your dataset size.

Thanos Query Deployment

thanos-query.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-query
  template:
    metadata:
      labels:
        app: thanos-query
    spec:
      containers:
        - name: thanos-query
          image: quay.io/thanos/thanos:v0.36.1
          args:
            - query
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:9090
            # Connect to the Thanos Sidecar in kube-prometheus-stack
            - --endpoint=dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.cluster.local
            # Connect to the Store Gateway
            - --endpoint=dnssrv+_grpc._tcp.thanos-store-gateway.monitoring.svc.cluster.local
            # Deduplicate HA Prometheus pairs using the 'prometheus_replica' label
            - --query.replica-label=prometheus_replica
            - --query.auto-downsampling
          ports:
            - { name: http, containerPort: 9090 }
            - { name: grpc, containerPort: 10901 }
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              memory: 1Gi
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: http
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              path: /-/ready
              port: http
            initialDelaySeconds: 10

The --endpoint flags tell Thanos Query where to find StoreAPI providers. Using dnssrv+ prefixed endpoints enables DNS-based service discovery, which automatically picks up new sidecar or Store Gateway pods as they scale. The --query.replica-label flag enables deduplication — if you run two Prometheus replicas for high availability, Thanos Query returns only one copy of each time series instead of duplicates.

Pointing Grafana at Thanos Query

Once Thanos Query is running, configure Grafana to use it as a data source instead of (or in addition to) the direct Prometheus connection. In your kube-prometheus-stack values.yaml:

values.yaml — Grafana Thanos Data Source
grafana:
  additionalDataSources:
    - name: Thanos
      type: prometheus
      url: http://thanos-query.monitoring.svc.cluster.local:9090
      access: proxy
      isDefault: true
      jsonData:
        timeInterval: 30s
        httpMethod: POST  # POST avoids URL length limits on large queries

Setting isDefault: true makes Thanos Query the default data source for all existing Grafana dashboards. All pre-built kube-prometheus-stack dashboards will automatically use Thanos Query, giving them access to the full historical data range without any dashboard modifications.

Multi-Cluster Monitoring with Thanos

Multi-cluster monitoring is one of the most compelling reasons to adopt Thanos. The architecture is straightforward: each Kubernetes cluster runs its own kube-prometheus-stack with the Thanos sidecar enabled, all writing to the same object storage bucket (or separate buckets — Thanos supports both). A central Thanos Query deployment aggregates data from all clusters.

Architecture Pattern

The recommended multi-cluster setup uses a hub-and-spoke model:

  • Spoke clusters — Each application cluster runs kube-prometheus-stack with Thanos Sidecar, uploading metrics to a shared object storage bucket. Each spoke uses unique externalLabels (e.g., cluster: prod-us-east, cluster: prod-eu-west) to identify its metrics.
  • Hub cluster — A central monitoring cluster runs Thanos Query, Store Gateway, Compactor, and Grafana. Thanos Query connects to sidecars in all spoke clusters for real-time data and to the Store Gateway for historical data.
  • Object storage — A single bucket (or per-cluster buckets with a shared prefix) holds all historical TSDB blocks. The Store Gateway and Compactor operate on this bucket.

Cross-Cluster Connectivity

The Thanos Query in the hub cluster needs gRPC connectivity to the Thanos sidecars in spoke clusters. There are several approaches depending on your network topology:

  • VPC Peering / Private Link — If all clusters are in the same cloud provider, use VPC peering or AWS PrivateLink / GCP Private Service Connect to expose the sidecar gRPC port across clusters. This keeps traffic private and avoids egress costs.
  • Ingress with mTLS — Expose the sidecar's gRPC endpoint via an ingress controller with mutual TLS authentication. This works across cloud providers and on-premises clusters but requires certificate management.
  • Upload-only mode — If cross-cluster network connectivity is impractical, sidecars can operate in upload-only mode. Real-time data is not queryable across clusters, but all historical data is available via the Store Gateway reading from the shared object storage bucket. This is simpler but introduces a 2-hour delay for cross-cluster data visibility.

Multi-Cluster Query Example

Once configured, querying across clusters is transparent. Grafana dashboards can use the cluster external label as a variable to filter or aggregate:

PromQL — Cross-Cluster CPU Usage
# Total CPU usage across all clusters
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (cluster)

# Compare memory usage between production and staging
sum(container_memory_working_set_bytes{container!=""})
  by (cluster, namespace)
  / on(cluster, namespace) group_left
sum(kube_resourcequota{resource="limits.memory", type="hard"})
  by (cluster, namespace)

Thanos vs. Cortex vs. Grafana Mimir

Thanos is not the only long-term storage solution for Prometheus. Understanding the alternatives helps you make an informed architectural decision.

Thanos

Architecture: Sidecar model — extends existing Prometheus instances without replacing them. Data flows from Prometheus local TSDB to object storage via sidecar upload. Queries fan out to sidecars (real-time) and Store Gateways (historical).

Strengths: Minimal changes to existing Prometheus setup. No remote_write latency overhead. True pull-based model that preserves Prometheus's native behavior. Excellent for teams already running kube-prometheus-stack. CNCF project with strong community.

Weaknesses: Requires gRPC connectivity between Thanos Query and sidecars for real-time queries. Compactor is a singleton, which can be a bottleneck at extreme scale. No native multi-tenancy.

Cortex

Architecture: Remote-write model — Prometheus sends metrics to Cortex via the remote_write API. Cortex distributes, replicates, and stores data in object storage using a microservices architecture (distributor, ingester, querier, compactor, store gateway).

Strengths: Native multi-tenancy. No need for gRPC connectivity to Prometheus — uses standard remote_write. Horizontally scalable with no singleton components. Battle-tested at very large scale (used to power Grafana Cloud before Mimir).

Weaknesses: More operationally complex — more microservices to manage. Remote_write adds latency and resource overhead on Prometheus. Largely superseded by Grafana Mimir.

Grafana Mimir

Architecture: Fork of Cortex with significant performance improvements. Same remote_write ingestion model but with a simplified operational model, split-and-merge compaction, and native support for out-of-order samples.

Strengths: Best-in-class query performance. Native multi-tenancy. Simplified operational model compared to Cortex. Backed by Grafana Labs with commercial support. Handles out-of-order samples natively (important for edge collection).

Weaknesses: Requires remote_write (same overhead as Cortex). More complex to deploy than Thanos for simple use cases. Grafana Labs project — not a CNCF project, so governance model differs.

When to Choose Thanos

Thanos is the best choice when you already run kube-prometheus-stack and want to add long-term storage with minimal architectural changes. It is also ideal when you need a sidecar-based approach that does not add remote_write overhead to Prometheus, and when multi-tenancy is not a hard requirement. For most teams running fewer than 10 clusters and fewer than 10 million active time series, Thanos offers the best balance of simplicity and capability.

Production Best Practices

  1. Always use external labels — Every Prometheus instance must have unique external labels. Use a consistent labeling scheme: cluster, environment, region. These labels are permanent in object storage — plan them carefully because changing them later requires re-uploading data.
  2. Deploy Thanos Compactor as a singleton — Run exactly one Compactor instance per object storage bucket. Multiple Compactors operating on the same bucket corrupt data. Use a Kubernetes Deployment with replicas: 1 and a PVC for its local working directory. Set appropriate retention flags: --retention.resolution-raw=30d, --retention.resolution-5m=180d, --retention.resolution-1h=365d.
  3. Size the Store Gateway index cache properly — The Store Gateway performance is dominated by index cache hit rate. Monitor the thanos_store_index_cache_hits_total and thanos_store_index_cache_requests_total metrics. If your hit rate drops below 90%, increase --index-cache-size. For large deployments, consider using a memcached-based index cache instead of in-memory.
  4. Monitor Thanos with Thanos — All Thanos components expose Prometheus metrics on their HTTP ports. Scrape these metrics and build dashboards to monitor upload lag (sidecar), query latency (querier), compaction progress (compactor), and cache hit rates (store gateway). The Thanos project provides mixin dashboards you can deploy alongside kube-prometheus-stack.
  5. Use bucket lifecycle policies — Configure object storage lifecycle policies as a safety net, but rely on Thanos Compactor's --retention.* flags for retention management. Lifecycle policies should be set slightly beyond Compactor retention to catch orphaned blocks, not as the primary retention mechanism.
  6. Enable query auto-downsampling — Pass --query.auto-downsampling to Thanos Query. This automatically selects 5-minute or 1-hour resolution data for queries spanning long time ranges, dramatically improving query performance for dashboards showing weeks or months of data.
  7. Set resource limits on all components — Thanos components can consume significant memory under load. The Store Gateway, in particular, can spike memory usage when loading large blocks. Set memory limits and monitor for OOM kills. Start with the resource values in this guide and adjust based on your data volume.
  8. Test disaster recovery procedures — Periodically verify that you can restore monitoring by deploying a fresh Thanos Store Gateway and Querier pointing at your object storage bucket. If the Store Gateway can serve historical queries without any local Prometheus data, your backup strategy is working.

Conclusion

Thanos transforms kube-prometheus-stack from a short-term monitoring tool into a complete long-term observability platform. By adding a sidecar to your existing Prometheus deployment and configuring an object storage backend, you get unlimited metric retention at object storage prices, global query capabilities across multiple clusters, automatic downsampling for cost-efficient historical queries, and a unified Grafana experience that spans days, months, or years of data.

The integration is intentionally non-disruptive — Thanos extends Prometheus rather than replacing it. Your existing alerting rules, recording rules, and dashboards continue to work unchanged. Ensure your ServiceMonitors are properly configured so all application metrics flow through to long-term storage. Start with the sidecar and object storage configuration in this guide, deploy a Store Gateway and Query instance, and you will have a production-grade long-term storage solution running within an afternoon.

For teams managing multiple Kubernetes clusters, Thanos is particularly transformative. A single Grafana dashboard can compare CPU usage across production, staging, and development clusters; SREs can investigate incidents with months of historical context; and capacity planning teams get the long-horizon data they need to forecast infrastructure growth accurately.

Ready to Deploy?

Get your full Kubernetes observability stack running in minutes with the official Helm chart, then extend it with Thanos for long-term storage.

Quick Install Guide Helm Chart Docs