Cette page est affichée en anglais. Une traduction française est en cours.

SkillsApr 14, 2026·3 min de lecture

Thanos — Global Prometheus with Unlimited Retention and High Availability

Thanos extends Prometheus with global query, unlimited storage via object storage, and HA replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing workflow.

AI Open Source · Community

Prêt pour agents

Installation avec revue préalable

Cet actif nécessite une revue. Le prompt copié demande un dry-run, affiche les écritures, puis continue seulement après confirmation.

Needs Confirmation · 64/100Policy : confirmer

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Established

Point d'entrée

step-1.md

Commande avec revue préalable

npx -y tokrepo@latest install 63ff1c2c-37c8-11f1-9bc6-00163e2b0d79 --target codex

Dry-run d'abord, confirmez les écritures, puis lancez cette commande.

TL;DR

Thanos extends Prometheus with global query, unlimited object-storage retention, and HA for multi-cluster observability.

§01

What it is

Thanos extends Prometheus with global query capabilities, unlimited storage via object storage (S3, GCS, Azure), and high-availability replication. It is the proven way to run Prometheus at multi-cluster, multi-year scale without changing your existing Prometheus setup. Thanos runs as a set of sidecar and gateway components alongside your Prometheus instances.

Thanos is designed for SRE and platform teams running Prometheus across multiple clusters who need a unified query interface and long-term metric retention.

§02

How it saves time or tokens

Prometheus alone has limited local storage and no built-in multi-cluster querying. Running Prometheus at scale requires either over-provisioning disk or losing old metrics. Thanos solves both problems: a sidecar uploads Prometheus blocks to cheap object storage for indefinite retention, and the Query component federates queries across all Prometheus instances. You keep your existing Prometheus setup and add Thanos components alongside it.

§03

How to use

Run the Thanos sidecar next to each Prometheus instance:

thanos sidecar \
  --tsdb.path=/prometheus \
  --prometheus.url=http://localhost:9090 \
  --objstore.config-file=bucket.yml

Configure object storage in bucket.yml:

type: S3
config:
  bucket: thanos-metrics
  endpoint: s3.amazonaws.com
  access_key: '...'
  secret_key: '...'

Run the Thanos Query component for global queries:

thanos query \
  --store=sidecar-1:10901 \
  --store=sidecar-2:10901 \
  --store=store-gateway:10901

Access the global Prometheus UI at the Thanos Query endpoint.

§04

Example

A complete Thanos deployment architecture:

Cluster A:  Prometheus + Thanos Sidecar ─┐
                                          ├─> Thanos Query ─> Grafana
Cluster B:  Prometheus + Thanos Sidecar ─┤
                                          │
Object Storage (S3) <── Sidecar uploads ──┘
         │
         └──> Thanos Store Gateway ──> Thanos Query
                (serves historical data)

Grafana points at Thanos Query as its Prometheus data source and gets a unified view across all clusters and time ranges.

§05

Related on TokRepo

Monitoring tools — Browse observability and monitoring tools
DevOps tools — Explore infrastructure tooling

§06

Common pitfalls

Not deploying a Store Gateway for historical queries. Without it, Thanos Query can only reach live Prometheus instances. The Store Gateway serves data from object storage for long-term queries.
Forgetting to configure compaction. Thanos Compact merges and downsamples historical blocks in object storage. Without it, storage costs grow linearly and queries over long time ranges slow down.
Running Thanos Query without deduplication. If you run HA Prometheus pairs, enable --query.replica-label to deduplicate metrics from replica instances.
Starting with an overly complex configuration instead of defaults. Begin with the minimal setup, verify it works, then customize incrementally. This approach catches configuration errors early and keeps troubleshooting straightforward.

Questions fréquentes

Does Thanos replace Prometheus?+

No. Thanos runs alongside Prometheus. You keep your existing Prometheus instances for scraping and alerting. Thanos adds global querying, long-term storage, and HA replication on top of Prometheus.

How much does object storage cost for Thanos?+

Object storage is extremely cheap. S3 costs about $0.023/GB/month. With Thanos compaction and downsampling, a year of metrics from a medium cluster might cost a few dollars per month in storage.

Can Thanos downsample old metrics?+

Yes. Thanos Compact downsamples old data to 5-minute and 1-hour resolution. This dramatically reduces storage size for historical data while maintaining enough resolution for trend analysis.

How does Thanos compare to Cortex/Mimir?+

Thanos and Cortex/Mimir solve the same problem (scaling Prometheus) with different architectures. Thanos uses a sidecar model that is simpler to deploy alongside existing Prometheus. Cortex/Mimir uses a pull-based model with remote write. Thanos is simpler to adopt; Mimir may scale better for very large deployments.

Does Thanos support Prometheus recording rules?+

Yes. Thanos Ruler evaluates recording rules and alerting rules against Thanos Query (the global view), enabling rules that span multiple Prometheus instances.

Sources citées (3)

Thanos GitHub— Thanos extends Prometheus with global query
Thanos Documentation— Thanos architecture and components
CNCF Thanos— CNCF incubating project

En lien sur TokRepo

Monitoring tools DevOps tools Self-hosted tools

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

CockroachDB — Distributed SQL for the Global Cloud

CockroachDB is a cloud-native, distributed SQL database designed for high availability, effortless horizontal scale, and geographic data placement. PostgreSQL-compatible wire protocol with serializable transactions across regions.

Skills

AI Open Source

Cortex — Horizontally Scalable Long-Term Storage for Prometheus

Cortex is a CNCF project that provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus metrics, letting you run Prometheus-as-a-Service at scale.

Skills

Script Depot

Prometheus Node Exporter — Hardware and OS Metrics for Unix Systems

Node Exporter is the official Prometheus exporter for machine-level metrics, exposing CPU, memory, disk, filesystem, and network statistics from Linux and other Unix systems via an HTTP endpoint.

Skills

AI Open Source

Redux — Predictable Global State Management for JS Apps

Redux is the original predictable state container for JavaScript apps. Modern Redux uses Redux Toolkit (RTK) which reduces boilerplate 80% and includes RTK Query for server state. Still the standard for large-scale React apps.

Skills

AI Open Source