What Loki Does
- Log Aggregation: Collect and store logs from all your services in one place
- LogQL: Prometheus-inspired query language for searching and aggregating logs
- Label-Based Indexing: Index only metadata (labels), not log content — 10x cheaper storage
- Grafana Integration: Native integration with Grafana for visualization
- Multi-Tenancy: Separate logs per tenant/team/environment
- Horizontal Scaling: Scale read/write paths independently
- Cloud Native: Designed for Kubernetes and cloud environments
- Compression: Gzip/LZ4/Snappy log compression
- Retention: Configurable retention periods per label stream
Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Applications│────▶│ Promtail / │────▶│ Loki │
│ Containers │ │ Fluent Bit /│ │ Distributor │
│ Systemd │ │ Vector / │ │ Ingester │
└──────────────┘ │ OTel │ │ Querier │
└──────────────┘ └──────┬───────┘
│
┌──────┴───────┐
│ Object │
│ Storage │
│ (S3/GCS/ │
│ MinIO/local)│
└──────────────┘Self-Hosting
Docker Compose (Simple Setup)
services:
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki
promtail:
image: grafana/promtail:latest
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail-config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
volumes:
loki-data:Promtail Config
# promtail-config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: myserver
__path__: /var/log/*log
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
relabel_configs:
- source_labels: ['__meta_docker_container_name']
target_label: containerLogQL (Query Language)
Basic Queries
# All logs from nginx
{container="nginx"}
# Logs containing "error" (case insensitive)
{container="nginx"} |~ "(?i)error"
# JSON logs with specific field
{job="api"} | json | level="error"
# Exclude healthchecks
{container="nginx"} != "/health"
# Multiple filters
{namespace="production", app="web"} |= "500" != "healthcheck"Metric Queries
# Count errors per minute
count_over_time({container="api"} |= "ERROR" [1m])
# Rate of requests
rate({container="nginx"}[5m])
# Error rate percentage
sum(rate({app="api"} |= "ERROR" [5m]))
/ sum(rate({app="api"}[5m])) * 100
# Top 10 hosts by log volume
topk(10, sum(rate({job="varlogs"}[5m])) by (host))Structured Logs
# Parse JSON and filter
{app="api"}
| json
| status >= 500
| duration > 1000
# Extract labels from logs
{app="web"}
| regexp `(?P<method>w+) (?P<path>/S+)`
| method="POST"Key Features
Cost Efficiency
ElasticSearch indexes every word in logs:
→ 100GB logs → 200-400GB storage
→ High CPU for indexing
→ Expensive RAM requirements
Loki indexes only labels:
→ 100GB logs → 50-100GB storage (compressed)
→ Low CPU for indexing
→ Minimal RAM (only for query time)Label-Based Sharding
Log stream = unique combination of labels
{namespace="prod", app="api", pod="api-abc123"}
{namespace="prod", app="web", pod="web-xyz789"}
Labels become index keys
Log content is only scanned during queriesIntegration with Metrics
Grafana Dashboard:
├── CPU Usage (Prometheus metric)
├── Error Rate (LogQL count_over_time)
├── Recent Errors (Loki logs)
└── Link errors to trace in TempoLoki vs Alternatives
| Feature | Loki | ElasticSearch | Splunk | Graylog |
|---|---|---|---|---|
| Open Source | Yes (AGPL-3.0) | Yes (Elastic/AGPL) | No | Yes (SSPL) |
| Indexing | Labels only | Full-text | Full-text | Full-text |
| Storage cost | Low | High | Very high | Medium |
| Query language | LogQL | KQL/Lucene | SPL | Graylog syntax |
| Grafana integration | Native | Plugin | Plugin | Plugin |
| Scale | Horizontal | Horizontal | Horizontal | Horizontal |
| Best for | Label-rich env (K8s) | Full-text search | Enterprise | Mid-size |
常见问题
Q: Loki 和 ElasticSearch 怎么选? A: 如果你主要想按时间范围和标签(container、namespace、pod)过滤日志,Loki 成本更低、效率更高。如果你需要对日志内容进行复杂的全文搜索和分析,ElasticSearch 更强大。
Q: 为什么只索引标签? A: 这是 Loki 的核心设计。大多数日志查询都是"给我 X 服务在 Y 时间段的日志",用标签索引就够了。然后用 grep 式过滤在查询时处理内容匹配。这样存储成本降低 10x+。
Q: 适合什么规模? A: 从单机日志(几 GB/天)到大规模生产集群(几 TB/天)都适用。单实例部署可以处理中小规模,分布式部署可以线性扩展到 PB 级。
来源与致谢
- GitHub: grafana/loki — 28K+ ⭐ | AGPL-3.0
- 官网: grafana.com/loki