# Thanos — Global Prometheus with Unlimited Retention and High Availability > Thanos extends Prometheus with global query, unlimited storage via object storage, and HA replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing workflow. ## Install Save in your project root: # Thanos — Global Prometheus with Unlimited Retention ## Quick Use ```bash # Run Thanos sidecar next to a Prometheus instance thanos sidecar \ --tsdb.path=/prometheus \ --prometheus.url=http://localhost:9090 \ --objstore.config-file=bucket.yml \ --grpc-address=0.0.0.0:10901 \ --http-address=0.0.0.0:10902 # Run Thanos Query to federate multiple Prometheus servers thanos query \ --http-address=0.0.0.0:9090 \ --store=thanos-sidecar-1:10901 \ --store=thanos-sidecar-2:10901 \ --store=thanos-store:10901 ``` ```yaml # bucket.yml — S3 object storage type: S3 config: bucket: thanos-metrics endpoint: s3.amazonaws.com access_key: ... secret_key: ... ``` ## Introduction Prometheus is the de-facto standard for cloud-native monitoring — but a single Prometheus instance limits retention and query scope. Thanos extends Prometheus with object storage (S3/GCS/Azure/MinIO) for unlimited retention, global querying across clusters, and HA via deduplication — all while keeping the Prometheus HTTP API compatible. With over 13,000 GitHub stars, Thanos is used by eBay, Adidas, GitLab, and hundreds of companies running multi-cluster Kubernetes. It's a CNCF Incubating project. ## What Thanos Does Thanos adds components that work alongside (or on top of) Prometheus: **Sidecar** uploads blocks to object storage, **Store Gateway** serves historical data from the bucket, **Query** federates multiple sources, **Compactor** downsamples old data, **Ruler** evaluates alerts across the federation, and **Receive** handles remote_write when sidecar isn't practical. ## Architecture Overview ``` [Object Storage (S3 / GCS / Azure)] ^ | [Thanos Sidecar] <-- upload old blocks | cluster A: Prometheus ----------+ | [Thanos Sidecar] | cluster B: Prometheus ----------+ [Thanos Store] --> reads old blocks from bucket [Thanos Compactor] --> downsample + compact [Thanos Ruler] --> global alert eval ^ | [Thanos Query] <-- Grafana queries here | Prometheus API (fully compatible) ``` ## Self-Hosting & Configuration ```yaml # Minimal Kubernetes Thanos stack apiVersion: apps/v1 kind: StatefulSet metadata: { name: prometheus } spec: serviceName: prometheus template: spec: containers: - name: prometheus image: prom/prometheus:v2.53.0 args: ["--storage.tsdb.retention.time=24h", "--web.enable-lifecycle"] - name: thanos-sidecar image: quay.io/thanos/thanos:v0.36.1 args: - sidecar - --tsdb.path=/prometheus - --prometheus.url=http://localhost:9090 - --objstore.config-file=/etc/thanos/bucket.yml volumeMounts: - { name: data, mountPath: /prometheus } - { name: bucket-cfg, mountPath: /etc/thanos } # Thanos Query exposes the usual Prometheus endpoint to Grafana # point Grafana data source at thanos-query:10902 ``` ## Key Features - **Unlimited retention** — blocks in object storage, pay for S3 bytes - **Global query** — federated PromQL across clusters - **High availability** — query dedup across replicated Prometheus pairs - **Downsampling** — 5m and 1h aggregates for fast long-range queries - **Remote write** — Receive component accepts Prometheus remote_write - **Compatible API** — drop-in for Grafana, Alertmanager - **CNCF Incubating** — stable governance, 300+ contributors - **Cache layers** — Index cache (Redis/memcached) for bucket reads ## Comparison with Similar Tools | Feature | Thanos | Cortex | Grafana Mimir | VictoriaMetrics | M3DB | |---|---|---|---|---|---| | Architecture | Sidecar + object store | Microservices | Mimir (Cortex fork) | Single-binary cluster | Distributed TSDB | | Storage | S3/GCS/Azure | S3/GCS/Azure | S3/GCS/Azure | Local + replication | DFS + RocksDB | | Query API | Prometheus | Prometheus | Prometheus | Prometheus + own | Prometheus (via proxy) | | Setup | Moderate | Complex | Complex | Very simple | Complex | | Cost | Low (S3 pricing) | Low | Low | Moderate | High | | Best For | Pragmatic HA + retention | Multi-tenant | Grafana shops | Simplicity + speed | Uber-scale | ## FAQ **Q: Thanos vs VictoriaMetrics — which is better?** A: VictoriaMetrics is simpler (single binary) and faster; Thanos uses cheaper object storage and is the reference HA solution for vanilla Prometheus. Pick Thanos if you already run Prometheus; VictoriaMetrics if starting fresh. **Q: Do I need the Receive component?** A: Only if Prometheus can't run close to your apps (e.g., short-lived edge workloads). Otherwise Sidecar + object storage is the recommended pattern. **Q: How much does object storage cost?** A: Often cents per GB per month. A cluster scraping 100K series at 1m resolution for a year might cost $5–20/month in S3 — dramatically cheaper than SSD block storage. **Q: Does Thanos replace Prometheus?** A: No — it augments it. Your Prometheus instances still do scrape + short-term local storage; Thanos handles long-term and global view. ## Sources - GitHub: https://github.com/thanos-io/thanos - Docs: https://thanos.io - Foundation: CNCF (Incubating) - License: Apache-2.0 --- Source: https://tokrepo.com/en/workflows/63ff1c2c-37c8-11f1-9bc6-00163e2b0d79 Author: AI Open Source