What Grafana Does
- Dashboards: Build interactive dashboards with 50+ panel types (graphs, gauges, tables, heatmaps, maps)
- Data Sources: Connect 100+ data sources including Prometheus, InfluxDB, Elasticsearch, PostgreSQL, MySQL, Loki, Tempo, and cloud services
- Alerting: Define alert rules, notification channels (Slack, PagerDuty, email), and escalation policies
- Explore: Ad-hoc query interface for investigating metrics and logs in real-time
- Annotations: Mark events on graphs (deploys, incidents, changes) for correlation
- Variables: Template variables for dynamic, reusable dashboards
- Provisioning: Infrastructure-as-code dashboard and datasource management via YAML/JSON
Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Browser │────▶│ Grafana │────▶│ Data Sources│
│ Dashboard │ │ Server (Go) │ │ Prometheus │
└──────────────┘ └──────────────┘ │ InfluxDB │
│ Elasticsearch│
│ PostgreSQL │
│ Loki (Logs) │
│ Tempo (Traces)│
└──────────────┘Self-Hosting
Docker Compose with Prometheus
services:
grafana:
image: grafana/grafana-oss:latest
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: your-password
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- prometheus
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
volumes:
grafana-data:
prometheus-data:Key Features
Dashboard Panels
Available panel types:
├── Time Series — line/area/bar charts over time
├── Stat — single value with sparkline
├── Gauge — circular gauge with thresholds
├── Bar Chart — categorical comparisons
├── Table — tabular data with sorting/filtering
├── Heatmap — time-based density visualization
├── Geomap — geographic data on world map
├── Logs — log line viewer with search
├── Traces — distributed trace visualization
├── Node Graph — network topology
├── Canvas — custom layout with drag-and-drop
└── 40+ more community panelsPromQL Example (Prometheus)
# CPU usage per container
rate(container_cpu_usage_seconds_total{namespace="production"}[5m]) * 100
# Request rate
sum(rate(http_requests_total[5m])) by (status_code)
# P99 latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100Alerting
# Alert rule example
groups:
- name: critical
rules:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate above 5%"Dashboard as Code
{
"dashboard": {
"title": "My Dashboard",
"panels": [
{
"title": "CPU Usage",
"type": "timeseries",
"datasource": "Prometheus",
"targets": [
{
"expr": "rate(node_cpu_seconds_total{mode='idle'}[5m])",
"legendFormat": "{{instance}}"
}
]
}
]
}
}LGTM Stack (Full Observability)
Grafana Labs provides a complete open-source observability stack:
| Component | Purpose | Data Type |
|---|---|---|
| Grafana | Visualization | Dashboards |
| Loki | Log aggregation | Logs |
| Tempo | Distributed tracing | Traces |
| Mimir | Metrics storage | Metrics |
| Alloy | Telemetry collector | Collection |
Grafana vs Alternatives
| Feature | Grafana | Kibana | Datadog | New Relic |
|---|---|---|---|---|
| Open Source | Yes (AGPL-3.0) | Yes (Elastic) | No | No |
| Self-hosted | Yes | Yes | No | No |
| Data sources | 100+ | Elasticsearch | Proprietary | Proprietary |
| Alerting | Built-in | Built-in | Built-in | Built-in |
| Pricing | Free (OSS) | Free (Basic) | $15/host/mo | $0.30/GB |
常见问题
Q: Grafana 适合初学者吗? A: Grafana 本身的学习曲线不高——拖拽面板、选择数据源即可。难点在于理解数据源(如 Prometheus 的 PromQL 查询语言)。建议从导入社区 Dashboard 开始学习。
Q: Grafana OSS 和 Grafana Cloud 有什么区别? A: OSS 是完全免费的自托管版本。Cloud 提供托管服务,免费层包含 10K 指标/50GB 日志/50GB 追踪。Cloud 还包含一些 OSS 没有的功能如 ML 驱动告警。
Q: 可以嵌入到自己的应用中吗? A: 可以。Grafana 支持 iframe 嵌入和匿名访问配置。也可以使用 Grafana 的 API 获取面板截图。
来源与致谢
- GitHub: grafana/grafana — 73.1K+ ⭐ | AGPL-3.0
- 官网: grafana.com