Thanos — Global Prometheus with Unlimited Retention and High Availability
Thanos extends Prometheus with global query, unlimited storage via object storage, and HA replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing workflow.
What it is
Thanos extends Prometheus with global query capabilities, unlimited storage via object storage (S3, GCS, Azure), and high-availability replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing Prometheus setup. Thanos runs as a set of sidecar and gateway components alongside your Prometheus instances.
Thanos is designed for SRE and platform teams running Prometheus across multiple clusters who need a unified query interface and long-term metric retention.
How it saves time or tokens
Prometheus alone has limited local storage and no built-in multi-cluster querying. Running Prometheus at scale requires either over-provisioning disk or losing old metrics. Thanos solves both problems: a sidecar uploads Prometheus blocks to cheap object storage for indefinite retention, and the Query component federates queries across all Prometheus instances. You keep your existing Prometheus setup and add Thanos components alongside it.
How to use
- Run the Thanos sidecar next to each Prometheus instance:
thanos sidecar \
--tsdb.path=/prometheus \
--prometheus.url=http://localhost:9090 \
--objstore.config-file=bucket.yml
- Configure object storage in
bucket.yml:
type: S3
config:
bucket: thanos-metrics
endpoint: s3.amazonaws.com
access_key: '...'
secret_key: '...'
- Run the Thanos Query component for global queries:
thanos query \
--store=sidecar-1:10901 \
--store=sidecar-2:10901 \
--store=store-gateway:10901
Access the global Prometheus UI at the Thanos Query endpoint.
Example
A complete Thanos deployment architecture:
Cluster A: Prometheus + Thanos Sidecar ─┐
├─> Thanos Query ─> Grafana
Cluster B: Prometheus + Thanos Sidecar ─┤
│
Object Storage (S3) <── Sidecar uploads ──┘
│
└──> Thanos Store Gateway ──> Thanos Query
(serves historical data)
Grafana points at Thanos Query as its Prometheus data source and gets a unified view across all clusters and time ranges.
Related on TokRepo
- Monitoring tools — Browse observability and monitoring tools
- DevOps tools — Explore infrastructure tooling
Common pitfalls
- Not deploying a Store Gateway for historical queries. Without it, Thanos Query can only reach live Prometheus instances. The Store Gateway serves data from object storage for long-term queries.
- Forgetting to configure compaction. Thanos Compact merges and downsamples historical blocks in object storage. Without it, storage costs grow linearly and queries over long time ranges slow down.
- Running Thanos Query without deduplication. If you run HA Prometheus pairs, enable
--query.replica-labelto deduplicate metrics from replica instances. - Starting with an overly complex configuration instead of defaults. Begin with the minimal setup, verify it works, then customize incrementally. This approach catches configuration errors early and keeps troubleshooting straightforward.
Frequently Asked Questions
No. Thanos runs alongside Prometheus. You keep your existing Prometheus instances for scraping and alerting. Thanos adds global querying, long-term storage, and HA replication on top of Prometheus.
Object storage is extremely cheap. S3 costs about $0.023/GB/month. With Thanos compaction and downsampling, a year of metrics from a medium cluster might cost a few dollars per month in storage.
Yes. Thanos Compact downsamples old data to 5-minute and 1-hour resolution. This dramatically reduces storage size for historical data while maintaining enough resolution for trend analysis.
Thanos and Cortex/Mimir solve the same problem (scaling Prometheus) with different architectures. Thanos uses a sidecar model that is simpler to deploy alongside existing Prometheus. Cortex/Mimir uses a pull-based model with remote write. Thanos is simpler to adopt; Mimir may scale better for very large deployments.
Yes. Thanos Ruler evaluates recording rules and alerting rules against Thanos Query (the global view), enabling rules that span multiple Prometheus instances.
Citations (3)
- Thanos GitHub— Thanos extends Prometheus with global query
- Thanos Documentation— Thanos architecture and components
- CNCF Thanos— CNCF incubating project
Related on TokRepo
Discussion
Related Assets
HumHub — Open-Source Enterprise Social Network
A flexible, open-source social networking platform built on Yii2 for creating private communities, intranets, and collaboration spaces within organizations.
Dolibarr — Open-Source ERP & CRM for Business Management
A modular open-source ERP and CRM application written in PHP for managing contacts, invoices, orders, inventory, accounting, and more from a single web interface.
PrestaShop — Open-Source PHP E-Commerce Platform
A widely adopted open-source e-commerce platform written in PHP with a rich module marketplace, multi-language support, and a strong European user base.