# Thanos — Global Prometheus with Unlimited Retention and High Availability

> Thanos extends Prometheus with global query, unlimited storage via object storage, and HA replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing workflow.

## Install

Save in your project root:

# Thanos — Global Prometheus with Unlimited Retention

## Quick Use
```bash
# Run Thanos sidecar next to a Prometheus instance
thanos sidecar \
  --tsdb.path=/prometheus \
  --prometheus.url=http://localhost:9090 \
  --objstore.config-file=bucket.yml \
  --grpc-address=0.0.0.0:10901 \
  --http-address=0.0.0.0:10902

# Run Thanos Query to federate multiple Prometheus servers
thanos query \
  --http-address=0.0.0.0:9090 \
  --store=thanos-sidecar-1:10901 \
  --store=thanos-sidecar-2:10901 \
  --store=thanos-store:10901
```

```yaml
# bucket.yml — S3 object storage
type: S3
config:
  bucket: thanos-metrics
  endpoint: s3.amazonaws.com
  access_key: ...
  secret_key: ...
```

## Introduction
Prometheus is the de-facto standard for cloud-native monitoring — but a single Prometheus instance limits retention and query scope. Thanos extends Prometheus with object storage (S3/GCS/Azure/MinIO) for unlimited retention, global querying across clusters, and HA via deduplication — all while keeping the Prometheus HTTP API compatible.

With over 13,000 GitHub stars, Thanos is used by eBay, Adidas, GitLab, and hundreds of companies running multi-cluster Kubernetes. It's a CNCF Incubating project.

## What Thanos Does
Thanos adds components that work alongside (or on top of) Prometheus: **Sidecar** uploads blocks to object storage, **Store Gateway** serves historical data from the bucket, **Query** federates multiple sources, **Compactor** downsamples old data, **Ruler** evaluates alerts across the federation, and **Receive** handles remote_write when sidecar isn't practical.

## Architecture Overview
```
                         [Object Storage (S3 / GCS / Azure)]
                                    ^
                                    |
                          [Thanos Sidecar]    <-- upload old blocks
                                    |
  cluster A:  Prometheus  ----------+
                                    |
                          [Thanos Sidecar]
                                    |
  cluster B:  Prometheus  ----------+

                          [Thanos Store]    --> reads old blocks from bucket
                          [Thanos Compactor] --> downsample + compact
                          [Thanos Ruler]     --> global alert eval
                                    ^
                                    |
                          [Thanos Query]     <-- Grafana queries here
                                    |
                           Prometheus API (fully compatible)
```

## Self-Hosting & Configuration
```yaml
# Minimal Kubernetes Thanos stack
apiVersion: apps/v1
kind: StatefulSet
metadata: { name: prometheus }
spec:
  serviceName: prometheus
  template:
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:v2.53.0
          args: ["--storage.tsdb.retention.time=24h", "--web.enable-lifecycle"]
        - name: thanos-sidecar
          image: quay.io/thanos/thanos:v0.36.1
          args:
            - sidecar
            - --tsdb.path=/prometheus
            - --prometheus.url=http://localhost:9090
            - --objstore.config-file=/etc/thanos/bucket.yml
          volumeMounts:
            - { name: data, mountPath: /prometheus }
            - { name: bucket-cfg, mountPath: /etc/thanos }

# Thanos Query exposes the usual Prometheus endpoint to Grafana
# point Grafana data source at thanos-query:10902
```

## Key Features
- **Unlimited retention** — blocks in object storage, pay for S3 bytes
- **Global query** — federated PromQL across clusters
- **High availability** — query dedup across replicated Prometheus pairs
- **Downsampling** — 5m and 1h aggregates for fast long-range queries
- **Remote write** — Receive component accepts Prometheus remote_write
- **Compatible API** — drop-in for Grafana, Alertmanager
- **CNCF Incubating** — stable governance, 300+ contributors
- **Cache layers** — Index cache (Redis/memcached) for bucket reads

## Comparison with Similar Tools
| Feature | Thanos | Cortex | Grafana Mimir | VictoriaMetrics | M3DB |
|---|---|---|---|---|---|
| Architecture | Sidecar + object store | Microservices | Mimir (Cortex fork) | Single-binary cluster | Distributed TSDB |
| Storage | S3/GCS/Azure | S3/GCS/Azure | S3/GCS/Azure | Local + replication | DFS + RocksDB |
| Query API | Prometheus | Prometheus | Prometheus | Prometheus + own | Prometheus (via proxy) |
| Setup | Moderate | Complex | Complex | Very simple | Complex |
| Cost | Low (S3 pricing) | Low | Low | Moderate | High |
| Best For | Pragmatic HA + retention | Multi-tenant | Grafana shops | Simplicity + speed | Uber-scale |

## FAQ
**Q: Thanos vs VictoriaMetrics — which is better?**
A: VictoriaMetrics is simpler (single binary) and faster; Thanos uses cheaper object storage and is the reference HA solution for vanilla Prometheus. Pick Thanos if you already run Prometheus; VictoriaMetrics if starting fresh.

**Q: Do I need the Receive component?**
A: Only if Prometheus can't run close to your apps (e.g., short-lived edge workloads). Otherwise Sidecar + object storage is the recommended pattern.

**Q: How much does object storage cost?**
A: Often cents per GB per month. A cluster scraping 100K series at 1m resolution for a year might cost $5–20/month in S3 — dramatically cheaper than SSD block storage.

**Q: Does Thanos replace Prometheus?**
A: No — it augments it. Your Prometheus instances still do scrape + short-term local storage; Thanos handles long-term and global view.

## Sources
- GitHub: https://github.com/thanos-io/thanos
- Docs: https://thanos.io
- Foundation: CNCF (Incubating)
- License: Apache-2.0

---
Source: https://tokrepo.com/en/workflows/63ff1c2c-37c8-11f1-9bc6-00163e2b0d79
Author: AI Open Source