ConfigsApr 11, 2026·1 min read

Jaeger — CNCF Distributed Tracing Platform

Jaeger is a CNCF-graduated distributed tracing system for monitoring microservice-based architectures. Track requests across services, identify latency hotspots, and understand root causes of failures in complex distributed systems.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# All-in-one dev container
docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 9411:9411 \
  jaegertracing/all-in-one:latest

# UI at http://localhost:16686

Instrument an app with OpenTelemetry (Node.js):

npm i @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
      @opentelemetry/exporter-trace-otlp-http
// tracing.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";

const sdk = new NodeSDK({
  serviceName: "tokrepo-api",
  traceExporter: new OTLPTraceExporter({
    url: "http://localhost:4318/v1/traces",
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Run with node --require ./tracing.ts dist/server.js — HTTP, DB, and framework calls are auto-traced.

Intro

Jaeger is a CNCF-graduated distributed tracing platform originally developed at Uber. Jaeger captures, stores, and visualizes traces — sequences of spans showing how a request flows through multiple microservices. Essential for debugging latency and failures in distributed systems.

What Jaeger Does

  • Trace collection — receive spans via OTLP, Jaeger protocol, Zipkin
  • Storage backends — Elasticsearch, Cassandra, Kafka, Badger, memory
  • Query API — search traces by service, operation, tags, duration
  • UI — waterfall view of spans, service dependencies graph
  • Sampling — adaptive, probabilistic, rate-limited
  • Service Performance Monitoring (SPM) — RED metrics from traces
  • Critical Path — highlight bottleneck spans

Architecture

Jaeger components:

  • Agent (deprecated, use OTLP) — local daemon
  • Collector — receives spans, writes to storage
  • Query — serves UI and API, reads from storage
  • Ingester — for Kafka async pipeline
  • All-in-one — dev container bundling everything
  • OpenTelemetry Collector — modern ingestion preferred

Self-Hosting

# Production deployment
components:
  - Collector (multi-replica, behind LB)
  - Elasticsearch cluster (storage)
  - Query service (multi-replica)
  - OTel Collector (ingestion)

Kubernetes: use the official Jaeger Operator or Helm charts.

Key Features

  • OpenTelemetry native ingestion
  • Multiple storage backends
  • Service dependency graph
  • Adaptive sampling
  • Trace search and filtering
  • RED metrics (SPM)
  • Zipkin compatibility
  • gRPC and HTTP APIs
  • Kubernetes operator

Comparison

Tracing Storage OTel Metrics
Jaeger ES, Cassandra, Kafka Yes SPM
Tempo Object storage Yes Via Grafana
Zipkin ES, MySQL, Cassandra Yes Partial
Honeycomb Managed Yes Yes
Lightstep Managed Yes Yes
OpenTelemetry Collector Any backend Native Yes

常见问题 FAQ

Q: Jaeger vs Tempo? A: Jaeger 有独立 UI、成熟生态;Tempo 把 trace 存对象存储(便宜),用 Grafana 查看,和 Loki/Prometheus 集成更好。

Q: 采样策略? A: 生产环境不要采 100%(浪费存储)。用 probabilistic 1% + 基于 tag 的 force-keep(错误、慢请求必留)。

Q: 和 OpenTelemetry 关系? A: OTel 是标准(API + SDK + Collector),Jaeger 是后端。新项目应用 OTel 采集 + Jaeger 存储查询。

来源与致谢 Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets