Skills2026年4月10日·1 分钟阅读

Prometheus — Open Source Monitoring & Alerting Toolkit

Prometheus is the CNCF-graduated monitoring system and time series database. Pull-based metrics collection, powerful PromQL queries, and built-in alerting for cloud-native infrastructure.

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项,确认后再继续。

Needs Confirmation · 64/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
step-1.md
先审查命令
npx -y tokrepo@latest install ed3a8de4-34ae-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run,确认写入项后再运行此命令。

TL;DR
Prometheus scrapes metrics from targets, stores them as time series, and provides PromQL for querying and alerting on infrastructure health.
§01

What it is

Prometheus is an open-source monitoring system and time series database, originally built at SoundCloud and now a CNCF-graduated project. It uses a pull-based model to scrape metrics from instrumented targets at configured intervals, stores them locally, and provides PromQL -- a powerful query language for aggregation, filtering, and alerting.

It is designed for DevOps engineers and SREs who need reliable metrics collection, alerting, and dashboarding for containerized and cloud-native workloads.

§02

How it saves time or tokens

Prometheus auto-discovers scrape targets in Kubernetes using service discovery, eliminating manual target configuration as services scale up or down. PromQL lets you write complex queries -- rate of HTTP errors over 5 minutes, 99th percentile latency per endpoint -- in a single expression. The built-in Alertmanager routes alerts to Slack, PagerDuty, or email based on configurable rules, replacing custom alerting scripts.

§03

How to use

  1. Start Prometheus with Docker.
docker run -d --name prometheus -p 9090:9090 \
  -v ./prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus:latest
  1. Create a minimal prometheus.yml.
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
  1. Open http://localhost:9090 and query metrics with PromQL.
rate(prometheus_http_requests_total[5m])
§04

Example

A PromQL query for alerting on high error rates:

sum(rate(http_requests_total{status=~"5.."}[5m]))
  /
sum(rate(http_requests_total[5m]))
  > 0.05

This fires when more than 5 percent of HTTP requests return 5xx status codes over a 5-minute window.

§05

Related on TokRepo

  • Monitoring tools -- Compare Prometheus with other observability solutions.
  • DevOps tools -- Infrastructure automation that pairs with monitoring.
§06

Common pitfalls

  • Prometheus stores data locally by default. For long-term retention, use Thanos or Cortex as a remote storage backend.
  • High-cardinality labels (user IDs, request IDs) cause memory usage to explode. Keep label cardinality bounded.
  • The pull model requires network access from Prometheus to all targets. In firewalled environments, use Pushgateway for short-lived jobs.

常见问题

How does Prometheus differ from Grafana?+

Prometheus collects and stores metrics and evaluates alerting rules. Grafana is a visualization layer that queries Prometheus (and other data sources) to render dashboards. They are complementary tools typically deployed together.

Does Prometheus work with Kubernetes?+

Yes. Prometheus has built-in Kubernetes service discovery. It auto-discovers pods, services, and endpoints using Kubernetes API annotations. The kube-prometheus-stack Helm chart bundles Prometheus, Alertmanager, Grafana, and pre-built dashboards.

What is PromQL?+

PromQL (Prometheus Query Language) is a functional query language for selecting, aggregating, and transforming time series data. It supports operations like rate, histogram_quantile, sum by label, and mathematical functions.

How does alerting work in Prometheus?+

You define alerting rules in YAML files that specify PromQL conditions and durations. When a condition is true for the specified duration, Prometheus fires the alert to Alertmanager, which handles deduplication, grouping, and routing to notification channels.

Can Prometheus handle long-term storage?+

Prometheus is optimized for short-to-medium retention (days to weeks). For long-term storage, integrate with Thanos, Cortex, or VictoriaMetrics, which provide horizontal scaling and object-store-backed retention.

引用来源 (3)
🙏

来源与感谢

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产