Introduction
Prometheus is the de-facto standard for cloud-native monitoring — but a single Prometheus instance limits retention and query scope. Thanos extends Prometheus with object storage (S3/GCS/Azure/MinIO) for unlimited retention, global querying across clusters, and HA via deduplication — all while keeping the Prometheus HTTP API compatible.
With over 13,000 GitHub stars, Thanos is used by eBay, Adidas, GitLab, and hundreds of companies running multi-cluster Kubernetes. It's a CNCF Incubating project.
What Thanos Does
Thanos adds components that work alongside (or on top of) Prometheus: Sidecar uploads blocks to object storage, Store Gateway serves historical data from the bucket, Query federates multiple sources, Compactor downsamples old data, Ruler evaluates alerts across the federation, and Receive handles remote_write when sidecar isn't practical.
Architecture Overview
[Object Storage (S3 / GCS / Azure)]
^
|
[Thanos Sidecar] <-- upload old blocks
|
cluster A: Prometheus ----------+
|
[Thanos Sidecar]
|
cluster B: Prometheus ----------+
[Thanos Store] --> reads old blocks from bucket
[Thanos Compactor] --> downsample + compact
[Thanos Ruler] --> global alert eval
^
|
[Thanos Query] <-- Grafana queries here
|
Prometheus API (fully compatible)Self-Hosting & Configuration
# Minimal Kubernetes Thanos stack
apiVersion: apps/v1
kind: StatefulSet
metadata: { name: prometheus }
spec:
serviceName: prometheus
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.53.0
args: ["--storage.tsdb.retention.time=24h", "--web.enable-lifecycle"]
- name: thanos-sidecar
image: quay.io/thanos/thanos:v0.36.1
args:
- sidecar
- --tsdb.path=/prometheus
- --prometheus.url=http://localhost:9090
- --objstore.config-file=/etc/thanos/bucket.yml
volumeMounts:
- { name: data, mountPath: /prometheus }
- { name: bucket-cfg, mountPath: /etc/thanos }
# Thanos Query exposes the usual Prometheus endpoint to Grafana
# point Grafana data source at thanos-query:10902Key Features
- Unlimited retention — blocks in object storage, pay for S3 bytes
- Global query — federated PromQL across clusters
- High availability — query dedup across replicated Prometheus pairs
- Downsampling — 5m and 1h aggregates for fast long-range queries
- Remote write — Receive component accepts Prometheus remote_write
- Compatible API — drop-in for Grafana, Alertmanager
- CNCF Incubating — stable governance, 300+ contributors
- Cache layers — Index cache (Redis/memcached) for bucket reads
Comparison with Similar Tools
| Feature | Thanos | Cortex | Grafana Mimir | VictoriaMetrics | M3DB |
|---|---|---|---|---|---|
| Architecture | Sidecar + object store | Microservices | Mimir (Cortex fork) | Single-binary cluster | Distributed TSDB |
| Storage | S3/GCS/Azure | S3/GCS/Azure | S3/GCS/Azure | Local + replication | DFS + RocksDB |
| Query API | Prometheus | Prometheus | Prometheus | Prometheus + own | Prometheus (via proxy) |
| Setup | Moderate | Complex | Complex | Very simple | Complex |
| Cost | Low (S3 pricing) | Low | Low | Moderate | High |
| Best For | Pragmatic HA + retention | Multi-tenant | Grafana shops | Simplicity + speed | Uber-scale |
FAQ
Q: Thanos vs VictoriaMetrics — which is better? A: VictoriaMetrics is simpler (single binary) and faster; Thanos uses cheaper object storage and is the reference HA solution for vanilla Prometheus. Pick Thanos if you already run Prometheus; VictoriaMetrics if starting fresh.
Q: Do I need the Receive component? A: Only if Prometheus can't run close to your apps (e.g., short-lived edge workloads). Otherwise Sidecar + object storage is the recommended pattern.
Q: How much does object storage cost? A: Often cents per GB per month. A cluster scraping 100K series at 1m resolution for a year might cost $5–20/month in S3 — dramatically cheaper than SSD block storage.
Q: Does Thanos replace Prometheus? A: No — it augments it. Your Prometheus instances still do scrape + short-term local storage; Thanos handles long-term and global view.
Sources
- GitHub: https://github.com/thanos-io/thanos
- Docs: https://thanos.io
- Foundation: CNCF (Incubating)
- License: Apache-2.0