ConfigsApr 11, 2026·3 min read

Jaeger — CNCF Distributed Tracing Platform

Jaeger is a CNCF-graduated distributed tracing system for monitoring microservice-based architectures. Track requests across services, identify latency hotspots, and understand root causes of failures in complex distributed systems.

TL;DR
CNCF-graduated distributed tracing system that tracks requests across services, identifies latency hotspots, and diagnoses failures.
§01

What it is

Jaeger is a CNCF-graduated distributed tracing system designed for monitoring and troubleshooting microservice-based architectures. It tracks requests as they flow across multiple services, showing the full call chain with timing data for each hop.

Jaeger helps developers identify latency hotspots, understand service dependencies, and diagnose root causes of failures in complex distributed systems. It supports OpenTelemetry natively and stores trace data in Elasticsearch, Cassandra, or Kafka.

§02

How it saves time or tokens

Debugging latency in a microservice architecture without distributed tracing means grepping logs across dozens of services. Jaeger provides a visual timeline of every service call in a request, immediately showing where time is spent.

For AI-assisted debugging, Jaeger's structured trace data can be exported as JSON and fed to an LLM for analysis. The model can identify patterns like cascading timeouts or retry storms that are hard to spot manually.

Additionally, the project's well-structured documentation and active community mean developers spend less time troubleshooting integration issues. When AI coding assistants generate code for this tool, they can reference established patterns from the documentation, producing correct implementations with fewer iterations and lower token costs.

§03

How to use

  1. Run Jaeger all-in-one for development:
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest
  1. Instrument your application with OpenTelemetry:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint='http://localhost:4317'))
)
trace.set_tracer_provider(provider)
  1. Open the Jaeger UI at http://localhost:16686 to search and visualize traces.
  1. Use the service dependency graph to understand how your services connect.
§04

Example

tracer = trace.get_tracer('my-service')

with tracer.start_as_current_span('process-order') as span:
    span.set_attribute('order.id', '12345')
    result = call_payment_service(order)
    span.set_attribute('payment.status', result.status)
§05

Related on TokRepo

§06

Common pitfalls

  • Tracing every request in production. At high throughput, tracing everything overwhelms storage. Use sampling (1% or adaptive) to capture representative traces without drowning in data.
  • Not propagating trace context across service boundaries. If any service in the chain does not propagate the trace ID, the trace breaks into disconnected fragments.
  • Using the all-in-one deployment in production. It stores traces in memory and loses them on restart. Use Elasticsearch or Cassandra for production storage.
  • Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.

Frequently Asked Questions

What is distributed tracing?+

Distributed tracing tracks a single request as it flows through multiple microservices. Each service creates a span (a unit of work) with timing data. Spans are linked by a shared trace ID, creating a tree that shows the full request lifecycle across services.

How does Jaeger compare to Zipkin?+

Both are open-source distributed tracing systems. Jaeger is CNCF-graduated with a more active community and better OpenTelemetry integration. Zipkin is simpler to deploy and has broader language support for legacy instrumentation. Both support the same core tracing concepts.

Does Jaeger support OpenTelemetry?+

Yes. Jaeger natively accepts OpenTelemetry data via the OTLP protocol. You instrument your applications with OpenTelemetry SDKs and export traces directly to Jaeger. This is the recommended approach for new deployments.

What storage backends does Jaeger support?+

Jaeger supports Elasticsearch, OpenSearch, Cassandra, Kafka (as a buffer), and an in-memory store for development. Elasticsearch is the most common production choice due to its query capabilities and operational maturity.

Can Jaeger trace AI agent interactions?+

Yes. You can create spans for each step in an AI agent workflow: LLM calls, tool invocations, retrieval operations. This gives visibility into where time and tokens are spent in AI agent pipelines, helping optimize both latency and cost.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets