Scripts2026年4月16日·1 分钟阅读

Cortex — Horizontally Scalable Long-Term Storage for Prometheus

Cortex is a CNCF project that provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus metrics, letting you run Prometheus-as-a-Service at scale.

Introduction

Cortex extends Prometheus with horizontal scalability, long-term storage, and multi-tenancy. While Prometheus stores data locally on a single node, Cortex ingests metrics from multiple Prometheus instances via remote_write, distributes them across a cluster, and stores them durably in object storage like S3 or GCS. It exposes a fully Prometheus-compatible query API so Grafana and other tools work without changes.

What Cortex Does

  • Ingests Prometheus metrics via the remote_write API from any number of Prometheus instances
  • Stores time series durably in object storage (S3, GCS, Azure Blob) with configurable retention
  • Provides a Prometheus-compatible query frontend for PromQL queries across all tenants
  • Supports multi-tenancy with per-tenant limits, isolation, and authentication
  • Compacts and deduplicates chunks in the background for efficient storage

Architecture Overview

Cortex is composed of microservices: the distributor receives incoming samples and shards them by series hash to ingesters, which batch writes into chunks. Chunks are periodically flushed to long-term object storage. The query frontend splits and caches PromQL queries, forwarding them to queriers that read from both ingesters (recent data) and object storage (historical data). A compactor runs background jobs to merge and deduplicate stored blocks. Each component scales independently.

Self-Hosting & Configuration

  • Deploy using the official Helm chart or Jsonnet/Tanka configuration
  • Set storage.engine=blocks and configure the S3/GCS bucket for long-term storage
  • Run in single-process mode for development or microservices mode for production
  • Configure tenant IDs via the X-Scope-OrgID header on remote_write and query requests
  • Set per-tenant ingestion and query limits in the runtime configuration file

Key Features

  • Horizontally scalable: add more ingesters, queriers, or compactors independently
  • Multi-tenant by default with per-tenant rate limits, retention policies, and query isolation
  • 100% PromQL compatible — use existing Grafana dashboards and alerting rules unchanged
  • Ruler component evaluates recording and alerting rules without a standalone Prometheus
  • Shuffle sharding reduces blast radius by assigning each tenant to a subset of ingesters

Comparison with Similar Tools

  • Thanos — sidecar-based approach that queries existing Prometheus stores; Cortex uses remote_write ingestion
  • Grafana Mimir — Cortex fork by Grafana Labs with performance improvements and AGPLv3 license
  • VictoriaMetrics — single-binary or clustered; simpler operations but fewer multi-tenant features
  • M3 — Uber's metrics platform with its own query language; Cortex stays PromQL-native
  • InfluxDB — time-series database with its own protocol; Cortex integrates with the Prometheus ecosystem

FAQ

Q: How does Cortex differ from Thanos? A: Cortex uses remote_write to centrally ingest metrics; Thanos uses a sidecar on each Prometheus to upload blocks. Both provide long-term storage and global querying.

Q: What is the relationship between Cortex and Grafana Mimir? A: Mimir is a fork of Cortex maintained by Grafana Labs. Cortex continues as an independent CNCF project with its own roadmap.

Q: Can I query years of historical data? A: Yes. Cortex stores data in object storage with configurable retention. The query frontend splits large queries for parallel execution.

Q: Does Cortex support recording rules? A: Yes. The ruler component evaluates Prometheus recording and alerting rules directly in Cortex without needing a separate Prometheus.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产