Configs2026年4月14日·1 分钟阅读

Thanos — Global Prometheus with Unlimited Retention and High Availability

Thanos extends Prometheus with global query, unlimited storage via object storage, and HA replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing workflow.

Introduction

Prometheus is the de-facto standard for cloud-native monitoring — but a single Prometheus instance limits retention and query scope. Thanos extends Prometheus with object storage (S3/GCS/Azure/MinIO) for unlimited retention, global querying across clusters, and HA via deduplication — all while keeping the Prometheus HTTP API compatible.

With over 13,000 GitHub stars, Thanos is used by eBay, Adidas, GitLab, and hundreds of companies running multi-cluster Kubernetes. It's a CNCF Incubating project.

What Thanos Does

Thanos adds components that work alongside (or on top of) Prometheus: Sidecar uploads blocks to object storage, Store Gateway serves historical data from the bucket, Query federates multiple sources, Compactor downsamples old data, Ruler evaluates alerts across the federation, and Receive handles remote_write when sidecar isn't practical.

Architecture Overview

                         [Object Storage (S3 / GCS / Azure)]
                                    ^
                                    |
                          [Thanos Sidecar]    <-- upload old blocks
                                    |
  cluster A:  Prometheus  ----------+
                                    |
                          [Thanos Sidecar]
                                    |
  cluster B:  Prometheus  ----------+

                          [Thanos Store]    --> reads old blocks from bucket
                          [Thanos Compactor] --> downsample + compact
                          [Thanos Ruler]     --> global alert eval
                                    ^
                                    |
                          [Thanos Query]     <-- Grafana queries here
                                    |
                           Prometheus API (fully compatible)

Self-Hosting & Configuration

# Minimal Kubernetes Thanos stack
apiVersion: apps/v1
kind: StatefulSet
metadata: { name: prometheus }
spec:
  serviceName: prometheus
  template:
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:v2.53.0
          args: ["--storage.tsdb.retention.time=24h", "--web.enable-lifecycle"]
        - name: thanos-sidecar
          image: quay.io/thanos/thanos:v0.36.1
          args:
            - sidecar
            - --tsdb.path=/prometheus
            - --prometheus.url=http://localhost:9090
            - --objstore.config-file=/etc/thanos/bucket.yml
          volumeMounts:
            - { name: data, mountPath: /prometheus }
            - { name: bucket-cfg, mountPath: /etc/thanos }

# Thanos Query exposes the usual Prometheus endpoint to Grafana
# point Grafana data source at thanos-query:10902

Key Features

  • Unlimited retention — blocks in object storage, pay for S3 bytes
  • Global query — federated PromQL across clusters
  • High availability — query dedup across replicated Prometheus pairs
  • Downsampling — 5m and 1h aggregates for fast long-range queries
  • Remote write — Receive component accepts Prometheus remote_write
  • Compatible API — drop-in for Grafana, Alertmanager
  • CNCF Incubating — stable governance, 300+ contributors
  • Cache layers — Index cache (Redis/memcached) for bucket reads

Comparison with Similar Tools

Feature Thanos Cortex Grafana Mimir VictoriaMetrics M3DB
Architecture Sidecar + object store Microservices Mimir (Cortex fork) Single-binary cluster Distributed TSDB
Storage S3/GCS/Azure S3/GCS/Azure S3/GCS/Azure Local + replication DFS + RocksDB
Query API Prometheus Prometheus Prometheus Prometheus + own Prometheus (via proxy)
Setup Moderate Complex Complex Very simple Complex
Cost Low (S3 pricing) Low Low Moderate High
Best For Pragmatic HA + retention Multi-tenant Grafana shops Simplicity + speed Uber-scale

FAQ

Q: Thanos vs VictoriaMetrics — which is better? A: VictoriaMetrics is simpler (single binary) and faster; Thanos uses cheaper object storage and is the reference HA solution for vanilla Prometheus. Pick Thanos if you already run Prometheus; VictoriaMetrics if starting fresh.

Q: Do I need the Receive component? A: Only if Prometheus can't run close to your apps (e.g., short-lived edge workloads). Otherwise Sidecar + object storage is the recommended pattern.

Q: How much does object storage cost? A: Often cents per GB per month. A cluster scraping 100K series at 1m resolution for a year might cost $5–20/month in S3 — dramatically cheaper than SSD block storage.

Q: Does Thanos replace Prometheus? A: No — it augments it. Your Prometheus instances still do scrape + short-term local storage; Thanos handles long-term and global view.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产