Configs2026年4月11日·1 分钟阅读

Jaeger — CNCF Distributed Tracing Platform

Jaeger is a CNCF-graduated distributed tracing system for monitoring microservice-based architectures. Track requests across services, identify latency hotspots, and understand root causes of failures in complex distributed systems.

AI
AI Open Source · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

# All-in-one dev container
docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 9411:9411 \
  jaegertracing/all-in-one:latest

# UI at http://localhost:16686

Instrument an app with OpenTelemetry (Node.js):

npm i @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
      @opentelemetry/exporter-trace-otlp-http
// tracing.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";

const sdk = new NodeSDK({
  serviceName: "tokrepo-api",
  traceExporter: new OTLPTraceExporter({
    url: "http://localhost:4318/v1/traces",
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Run with node --require ./tracing.ts dist/server.js — HTTP, DB, and framework calls are auto-traced.

介绍

Jaeger is a CNCF-graduated distributed tracing platform originally developed at Uber. Jaeger captures, stores, and visualizes traces — sequences of spans showing how a request flows through multiple microservices. Essential for debugging latency and failures in distributed systems.

What Jaeger Does

  • Trace collection — receive spans via OTLP, Jaeger protocol, Zipkin
  • Storage backends — Elasticsearch, Cassandra, Kafka, Badger, memory
  • Query API — search traces by service, operation, tags, duration
  • UI — waterfall view of spans, service dependencies graph
  • Sampling — adaptive, probabilistic, rate-limited
  • Service Performance Monitoring (SPM) — RED metrics from traces
  • Critical Path — highlight bottleneck spans

Architecture

Jaeger components:

  • Agent (deprecated, use OTLP) — local daemon
  • Collector — receives spans, writes to storage
  • Query — serves UI and API, reads from storage
  • Ingester — for Kafka async pipeline
  • All-in-one — dev container bundling everything
  • OpenTelemetry Collector — modern ingestion preferred

Self-Hosting

# Production deployment
components:
  - Collector (multi-replica, behind LB)
  - Elasticsearch cluster (storage)
  - Query service (multi-replica)
  - OTel Collector (ingestion)

Kubernetes: use the official Jaeger Operator or Helm charts.

Key Features

  • OpenTelemetry native ingestion
  • Multiple storage backends
  • Service dependency graph
  • Adaptive sampling
  • Trace search and filtering
  • RED metrics (SPM)
  • Zipkin compatibility
  • gRPC and HTTP APIs
  • Kubernetes operator

Comparison

Tracing Storage OTel Metrics
Jaeger ES, Cassandra, Kafka Yes SPM
Tempo Object storage Yes Via Grafana
Zipkin ES, MySQL, Cassandra Yes Partial
Honeycomb Managed Yes Yes
Lightstep Managed Yes Yes
OpenTelemetry Collector Any backend Native Yes

常见问题 FAQ

Q: Jaeger vs Tempo? A: Jaeger 有独立 UI、成熟生态;Tempo 把 trace 存对象存储(便宜),用 Grafana 查看,和 Loki/Prometheus 集成更好。

Q: 采样策略? A: 生产环境不要采 100%(浪费存储)。用 probabilistic 1% + 基于 tag 的 force-keep(错误、慢请求必留)。

Q: 和 OpenTelemetry 关系? A: OTel 是标准(API + SDK + Collector),Jaeger 是后端。新项目应用 OTel 采集 + Jaeger 存储查询。

来源与致谢 Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产