Prometheus — Open Source Monitoring & Alerting Toolkit
Prometheus is the CNCF-graduated monitoring system and time series database. Pull-based metrics collection, powerful PromQL queries, and built-in alerting for cloud-native infrastructure.
What it is
Prometheus is an open-source monitoring system and time series database, originally built at SoundCloud and now a CNCF-graduated project. It uses a pull-based model to scrape metrics from instrumented targets at configured intervals, stores them locally, and provides PromQL -- a powerful query language for aggregation, filtering, and alerting.
It is designed for DevOps engineers and SREs who need reliable metrics collection, alerting, and dashboarding for containerized and cloud-native workloads.
How it saves time or tokens
Prometheus auto-discovers scrape targets in Kubernetes using service discovery, eliminating manual target configuration as services scale up or down. PromQL lets you write complex queries -- rate of HTTP errors over 5 minutes, 99th percentile latency per endpoint -- in a single expression. The built-in Alertmanager routes alerts to Slack, PagerDuty, or email based on configurable rules, replacing custom alerting scripts.
How to use
- Start Prometheus with Docker.
docker run -d --name prometheus -p 9090:9090 \
-v ./prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus:latest
- Create a minimal
prometheus.yml.
global:
scrape_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
- Open
http://localhost:9090and query metrics with PromQL.
rate(prometheus_http_requests_total[5m])
Example
A PromQL query for alerting on high error rates:
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
> 0.05
This fires when more than 5 percent of HTTP requests return 5xx status codes over a 5-minute window.
Related on TokRepo
- Monitoring tools -- Compare Prometheus with other observability solutions.
- DevOps tools -- Infrastructure automation that pairs with monitoring.
Common pitfalls
- Prometheus stores data locally by default. For long-term retention, use Thanos or Cortex as a remote storage backend.
- High-cardinality labels (user IDs, request IDs) cause memory usage to explode. Keep label cardinality bounded.
- The pull model requires network access from Prometheus to all targets. In firewalled environments, use Pushgateway for short-lived jobs.
Frequently Asked Questions
Prometheus collects and stores metrics and evaluates alerting rules. Grafana is a visualization layer that queries Prometheus (and other data sources) to render dashboards. They are complementary tools typically deployed together.
Yes. Prometheus has built-in Kubernetes service discovery. It auto-discovers pods, services, and endpoints using Kubernetes API annotations. The kube-prometheus-stack Helm chart bundles Prometheus, Alertmanager, Grafana, and pre-built dashboards.
PromQL (Prometheus Query Language) is a functional query language for selecting, aggregating, and transforming time series data. It supports operations like rate, histogram_quantile, sum by label, and mathematical functions.
You define alerting rules in YAML files that specify PromQL conditions and durations. When a condition is true for the specified duration, Prometheus fires the alert to Alertmanager, which handles deduplication, grouping, and routing to notification channels.
Prometheus is optimized for short-to-medium retention (days to weeks). For long-term storage, integrate with Thanos, Cortex, or VictoriaMetrics, which provide horizontal scaling and object-store-backed retention.
Citations (3)
- Prometheus GitHub— CNCF-graduated monitoring system originally built at SoundCloud
- Prometheus Documentation— Pull-based metrics collection with PromQL query language
- CNCF Landscape— CNCF graduated project
Related on TokRepo
Source & Thanks
- GitHub: prometheus/prometheus — 63.5K+ ⭐ | Apache-2.0
- Website: prometheus.io
Discussion
Related Assets
Conda — Cross-Platform Package and Environment Manager
Install, update, and manage packages and isolated environments for Python, R, C/C++, and hundreds of other languages from a single tool.
Sphinx — Python Documentation Generator
Generate professional documentation from reStructuredText and Markdown with cross-references, API autodoc, and multiple output formats.
Neutralinojs — Lightweight Cross-Platform Desktop Apps
Build desktop applications with HTML, CSS, and JavaScript using a tiny native runtime instead of bundling Chromium.