ConfigsApr 14, 2026·3 min read

Thanos — Global Prometheus with Unlimited Retention and High Availability

Thanos extends Prometheus with global query, unlimited storage via object storage, and HA replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing workflow.

TL;DR
Thanos extends Prometheus with global query, unlimited object-storage retention, and HA for multi-cluster observability.
§01

What it is

Thanos extends Prometheus with global query capabilities, unlimited storage via object storage (S3, GCS, Azure), and high-availability replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing Prometheus setup. Thanos runs as a set of sidecar and gateway components alongside your Prometheus instances.

Thanos is designed for SRE and platform teams running Prometheus across multiple clusters who need a unified query interface and long-term metric retention.

§02

How it saves time or tokens

Prometheus alone has limited local storage and no built-in multi-cluster querying. Running Prometheus at scale requires either over-provisioning disk or losing old metrics. Thanos solves both problems: a sidecar uploads Prometheus blocks to cheap object storage for indefinite retention, and the Query component federates queries across all Prometheus instances. You keep your existing Prometheus setup and add Thanos components alongside it.

§03

How to use

  1. Run the Thanos sidecar next to each Prometheus instance:
thanos sidecar \
  --tsdb.path=/prometheus \
  --prometheus.url=http://localhost:9090 \
  --objstore.config-file=bucket.yml
  1. Configure object storage in bucket.yml:
type: S3
config:
  bucket: thanos-metrics
  endpoint: s3.amazonaws.com
  access_key: '...'
  secret_key: '...'
  1. Run the Thanos Query component for global queries:
thanos query \
  --store=sidecar-1:10901 \
  --store=sidecar-2:10901 \
  --store=store-gateway:10901

Access the global Prometheus UI at the Thanos Query endpoint.

§04

Example

A complete Thanos deployment architecture:

Cluster A:  Prometheus + Thanos Sidecar ─┐
                                          ├─> Thanos Query ─> Grafana
Cluster B:  Prometheus + Thanos Sidecar ─┤
                                          │
Object Storage (S3) <── Sidecar uploads ──┘
         │
         └──> Thanos Store Gateway ──> Thanos Query
                (serves historical data)

Grafana points at Thanos Query as its Prometheus data source and gets a unified view across all clusters and time ranges.

§05

Related on TokRepo

§06

Common pitfalls

  • Not deploying a Store Gateway for historical queries. Without it, Thanos Query can only reach live Prometheus instances. The Store Gateway serves data from object storage for long-term queries.
  • Forgetting to configure compaction. Thanos Compact merges and downsamples historical blocks in object storage. Without it, storage costs grow linearly and queries over long time ranges slow down.
  • Running Thanos Query without deduplication. If you run HA Prometheus pairs, enable --query.replica-label to deduplicate metrics from replica instances.
  • Starting with an overly complex configuration instead of defaults. Begin with the minimal setup, verify it works, then customize incrementally. This approach catches configuration errors early and keeps troubleshooting straightforward.

Frequently Asked Questions

Does Thanos replace Prometheus?+

No. Thanos runs alongside Prometheus. You keep your existing Prometheus instances for scraping and alerting. Thanos adds global querying, long-term storage, and HA replication on top of Prometheus.

How much does object storage cost for Thanos?+

Object storage is extremely cheap. S3 costs about $0.023/GB/month. With Thanos compaction and downsampling, a year of metrics from a medium cluster might cost a few dollars per month in storage.

Can Thanos downsample old metrics?+

Yes. Thanos Compact downsamples old data to 5-minute and 1-hour resolution. This dramatically reduces storage size for historical data while maintaining enough resolution for trend analysis.

How does Thanos compare to Cortex/Mimir?+

Thanos and Cortex/Mimir solve the same problem (scaling Prometheus) with different architectures. Thanos uses a sidecar model that is simpler to deploy alongside existing Prometheus. Cortex/Mimir uses a pull-based model with remote write. Thanos is simpler to adopt; Mimir may scale better for very large deployments.

Does Thanos support Prometheus recording rules?+

Yes. Thanos Ruler evaluates recording rules and alerting rules against Thanos Query (the global view), enabling rules that span multiple Prometheus instances.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets