Introduction
Cortex extends Prometheus with horizontal scalability, long-term storage, and multi-tenancy. While Prometheus stores data locally on a single node, Cortex ingests metrics from multiple Prometheus instances via remote_write, distributes them across a cluster, and stores them durably in object storage like S3 or GCS. It exposes a fully Prometheus-compatible query API so Grafana and other tools work without changes.
What Cortex Does
- Ingests Prometheus metrics via the remote_write API from any number of Prometheus instances
- Stores time series durably in object storage (S3, GCS, Azure Blob) with configurable retention
- Provides a Prometheus-compatible query frontend for PromQL queries across all tenants
- Supports multi-tenancy with per-tenant limits, isolation, and authentication
- Compacts and deduplicates chunks in the background for efficient storage
Architecture Overview
Cortex is composed of microservices: the distributor receives incoming samples and shards them by series hash to ingesters, which batch writes into chunks. Chunks are periodically flushed to long-term object storage. The query frontend splits and caches PromQL queries, forwarding them to queriers that read from both ingesters (recent data) and object storage (historical data). A compactor runs background jobs to merge and deduplicate stored blocks. Each component scales independently.
Self-Hosting & Configuration
- Deploy using the official Helm chart or Jsonnet/Tanka configuration
- Set
storage.engine=blocksand configure the S3/GCS bucket for long-term storage - Run in single-process mode for development or microservices mode for production
- Configure tenant IDs via the
X-Scope-OrgIDheader on remote_write and query requests - Set per-tenant ingestion and query limits in the runtime configuration file
Key Features
- Horizontally scalable: add more ingesters, queriers, or compactors independently
- Multi-tenant by default with per-tenant rate limits, retention policies, and query isolation
- 100% PromQL compatible — use existing Grafana dashboards and alerting rules unchanged
- Ruler component evaluates recording and alerting rules without a standalone Prometheus
- Shuffle sharding reduces blast radius by assigning each tenant to a subset of ingesters
Comparison with Similar Tools
- Thanos — sidecar-based approach that queries existing Prometheus stores; Cortex uses remote_write ingestion
- Grafana Mimir — Cortex fork by Grafana Labs with performance improvements and AGPLv3 license
- VictoriaMetrics — single-binary or clustered; simpler operations but fewer multi-tenant features
- M3 — Uber's metrics platform with its own query language; Cortex stays PromQL-native
- InfluxDB — time-series database with its own protocol; Cortex integrates with the Prometheus ecosystem
FAQ
Q: How does Cortex differ from Thanos? A: Cortex uses remote_write to centrally ingest metrics; Thanos uses a sidecar on each Prometheus to upload blocks. Both provide long-term storage and global querying.
Q: What is the relationship between Cortex and Grafana Mimir? A: Mimir is a fork of Cortex maintained by Grafana Labs. Cortex continues as an independent CNCF project with its own roadmap.
Q: Can I query years of historical data? A: Yes. Cortex stores data in object storage with configurable retention. The query frontend splits large queries for parallel execution.
Q: Does Cortex support recording rules? A: Yes. The ruler component evaluates Prometheus recording and alerting rules directly in Cortex without needing a separate Prometheus.