Cortex — Horizontally Scalable Long-Term Storage for Prometheus

Introduction

Cortex extends Prometheus with horizontal scalability, long-term storage, and multi-tenancy. While Prometheus stores data locally on a single node, Cortex ingests metrics from multiple Prometheus instances via remote_write, distributes them across a cluster, and stores them durably in object storage like S3 or GCS. It exposes a fully Prometheus-compatible query API so Grafana and other tools work without changes.

What Cortex Does

Ingests Prometheus metrics via the remote_write API from any number of Prometheus instances
Stores time series durably in object storage (S3, GCS, Azure Blob) with configurable retention
Provides a Prometheus-compatible query frontend for PromQL queries across all tenants
Supports multi-tenancy with per-tenant limits, isolation, and authentication
Compacts and deduplicates chunks in the background for efficient storage

Architecture Overview

Cortex is composed of microservices: the distributor receives incoming samples and shards them by series hash to ingesters, which batch writes into chunks. Chunks are periodically flushed to long-term object storage. The query frontend splits and caches PromQL queries, forwarding them to queriers that read from both ingesters (recent data) and object storage (historical data). A compactor runs background jobs to merge and deduplicate stored blocks. Each component scales independently.

Self-Hosting & Configuration

Deploy using the official Helm chart or Jsonnet/Tanka configuration
Set storage.engine=blocks and configure the S3/GCS bucket for long-term storage
Run in single-process mode for development or microservices mode for production
Configure tenant IDs via the X-Scope-OrgID header on remote_write and query requests
Set per-tenant ingestion and query limits in the runtime configuration file

Key Features

Horizontally scalable: add more ingesters, queriers, or compactors independently
Multi-tenant by default with per-tenant rate limits, retention policies, and query isolation
100% PromQL compatible — use existing Grafana dashboards and alerting rules unchanged
Ruler component evaluates recording and alerting rules without a standalone Prometheus
Shuffle sharding reduces blast radius by assigning each tenant to a subset of ingesters

Comparison with Similar Tools

Thanos — sidecar-based approach that queries existing Prometheus stores; Cortex uses remote_write ingestion
Grafana Mimir — Cortex fork by Grafana Labs with performance improvements and AGPLv3 license
VictoriaMetrics — single-binary or clustered; simpler operations but fewer multi-tenant features
M3 — Uber's metrics platform with its own query language; Cortex stays PromQL-native
InfluxDB — time-series database with its own protocol; Cortex integrates with the Prometheus ecosystem

FAQ

Q: How does Cortex differ from Thanos? A: Cortex uses remote_write to centrally ingest metrics; Thanos uses a sidecar on each Prometheus to upload blocks. Both provide long-term storage and global querying.

Q: What is the relationship between Cortex and Grafana Mimir? A: Mimir is a fork of Cortex maintained by Grafana Labs. Cortex continues as an independent CNCF project with its own roadmap.

Q: Can I query years of historical data? A: Yes. Cortex stores data in object storage with configurable retention. The query frontend splits large queries for parallel execution.

Q: Does Cortex support recording rules? A: Yes. The ruler component evaluates Prometheus recording and alerting rules directly in Cortex without needing a separate Prometheus.

Cortex — Horizontally Scalable Long-Term Storage for Prometheus

Introduction

What Cortex Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

CUE — Validate, Define, and Generate Configuration with Types

Prometheus Operator — Kubernetes-Native Monitoring Stack Management

Ory Kratos — Cloud-Native Identity and User Management