# Prometheus — Open Source Monitoring & Alerting Toolkit > Prometheus is the CNCF-graduated monitoring system and time series database. Pull-based metrics collection, powerful PromQL queries, and built-in alerting for cloud-native infrastructure. ## Install Save in your project root: ## Quick Use ```bash docker run -d --name prometheus -p 9090:9090 -v ./prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus:latest ``` Create `prometheus.yml`: ```yaml global: scrape_interval: 15s scrape_configs: - job_name: prometheus static_configs: - targets: ["localhost:9090"] ``` Open `http://localhost:9090` — start querying metrics with PromQL. ## Intro **Prometheus** is an open-source monitoring system and time series database, originally built at SoundCloud and now a CNCF graduated project (same status as Kubernetes). It collects metrics from configured targets at given intervals, evaluates rule expressions, displays results, and triggers alerts when specified conditions are observed. With 63.5K+ GitHub stars and Apache-2.0 license, Prometheus is the de facto standard for cloud-native monitoring, deeply integrated with Kubernetes and the entire CNCF ecosystem. ## What Prometheus Does - **Metrics Collection**: Pull-based metrics scraping from instrumented applications and exporters - **Time Series DB**: Efficient local storage optimized for time series data with compression - **PromQL**: Powerful query language for slicing, dicing, and aggregating time series data - **Alerting**: Alert rules with Alertmanager for routing, grouping, and notification - **Service Discovery**: Auto-discover targets from Kubernetes, Consul, DNS, EC2, and more - **Exporters**: 500+ exporters for databases, hardware, messaging, storage, and cloud services - **Federation**: Hierarchical federation for scaling across multiple Prometheus instances ## Architecture ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Targets │◀────│ Prometheus │────▶│ Alertmanager│ │ (Exporters) │pull │ Server │ │ (Notify) │ │ │ │ TSDB + Rules│ └──────────────┘ └──────────────┘ └──────┬───────┘ │ ┌──────┴───────┐ │ Grafana │ │ (Visualize) │ └──────────────┘ ``` Key design principle: **Pull-based** — Prometheus scrapes metrics from HTTP endpoints, rather than having applications push metrics. This makes it easier to detect when a target is down. ## Self-Hosting ### Docker Compose (Full Stack) ```yaml services: prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - ./alert.rules.yml:/etc/prometheus/alert.rules.yml - prometheus-data:/prometheus command: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.retention.time=30d" alertmanager: image: prom/alertmanager:latest ports: - "9093:9093" volumes: - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml node-exporter: image: prom/node-exporter:latest ports: - "9100:9100" volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro volumes: prometheus-data: ``` ### Configuration ```yaml # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: ["alertmanager:9093"] rule_files: - "alert.rules.yml" scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: "node" static_configs: - targets: ["node-exporter:9100"] - job_name: "kubernetes-pods" kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true ``` ## PromQL Essentials ```promql # Instant vector — current value up{job="node"} # Range vector — values over time node_cpu_seconds_total[5m] # Rate — per-second rate of increase rate(http_requests_total[5m]) # Aggregation — sum across instances sum(rate(http_requests_total[5m])) by (method, status) # Histogram quantile — P95 latency histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) # Arithmetic — error percentage sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 # Prediction — disk full in 4 hours? predict_linear(node_filesystem_avail_bytes[1h], 4 * 3600) < 0 ``` ## Instrumenting Your App ### Go ```go import "github.com/prometheus/client_golang/prometheus" var httpRequests = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total HTTP requests", }, []string{"method", "status"}, ) func init() { prometheus.MustRegister(httpRequests) } // In handler: httpRequests.WithLabelValues("GET", "200").Inc() ``` ### Python ```python from prometheus_client import Counter, start_http_server REQUEST_COUNT = Counter('http_requests_total', 'Total requests', ['method', 'status']) REQUEST_COUNT.labels(method='GET', status='200').inc() start_http_server(8000) # Expose /metrics on port 8000 ``` ## Popular Exporters | Exporter | Metrics | |----------|---------| | Node Exporter | CPU, memory, disk, network (Linux) | | cAdvisor | Container resource usage | | MySQL Exporter | Query performance, connections | | PostgreSQL Exporter | Database stats, replication | | Redis Exporter | Memory, keys, commands | | Blackbox Exporter | HTTP, DNS, TCP, ICMP probes | | NGINX Exporter | Requests, connections, status | ## Prometheus vs Alternatives | Feature | Prometheus | InfluxDB | Datadog | Victoria Metrics | |---------|-----------|----------|---------|-----------------| | Open Source | Yes (Apache-2.0) | Partial | No | Yes (Apache-2.0) | | Collection | Pull-based | Push-based | Agent | Pull + Push | | Query | PromQL | InfluxQL/Flux | Proprietary | MetricsQL | | CNCF | Graduated | No | No | No | | Long-term storage | Needs remote | Built-in | Built-in | Built-in | | Kubernetes | Native | Plugin | Agent | Native | ## 常见问题 **Q: Prometheus 的数据能保存多久?** A: 默认保留 15 天。可以通过 `--storage.tsdb.retention.time` 调整。长期存储建议使用 Thanos 或 Cortex 等远端存储方案。 **Q: Prometheus 适合日志收集吗?** A: 不适合。Prometheus 专门用于数值型指标(metrics)。日志收集推荐 Loki(同为 Grafana Labs 出品),与 Prometheus 完美配合。 **Q: 一个 Prometheus 实例能抓取多少指标?** A: 单实例可以处理数百万活跃时间序列。超大规模环境可以使用联邦(federation)或 Thanos/Mimir 进行水平扩展。 ## 来源与致谢 - GitHub: [prometheus/prometheus](https://github.com/prometheus/prometheus) — 63.5K+ ⭐ | Apache-2.0 - 官网: [prometheus.io](https://prometheus.io) --- Source: https://tokrepo.com/en/workflows/ed3a8de4-34ae-11f1-9bc6-00163e2b0d79 Author: AI Open Source