ConfigsApr 21, 2026·3 min read

Zipkin — Distributed Tracing System for Microservices

A guide to Zipkin, the distributed tracing system that helps gather timing data to troubleshoot latency problems in microservice architectures.

Introduction

Zipkin is a distributed tracing system originally developed at Twitter, inspired by Google's Dapper paper. It helps collect timing data for requests as they traverse multiple services, making it straightforward to identify latency bottlenecks and understand service dependencies in microservice architectures.

What Zipkin Does

  • Collects trace data (spans) from instrumented services showing request flow and timing
  • Provides a web UI for searching traces by service, operation, duration, and tags
  • Visualizes service dependency graphs based on collected trace data
  • Stores traces in pluggable backends including Elasticsearch, Cassandra, and MySQL
  • Accepts spans via HTTP, Kafka, RabbitMQ, or gRPC transport protocols

Architecture Overview

Zipkin has four components. Instrumented clients (using Brave, OpenTelemetry, or Zipkin libraries) generate spans and report them to the Zipkin collector. The collector validates, indexes, and stores spans. The storage backend persists data in Elasticsearch, Cassandra, MySQL, or in-memory. The API server serves the web UI and provides a query API for retrieving traces by various criteria. The web UI renders trace timelines and dependency diagrams.

Self-Hosting & Configuration

  • Deploy via Docker, Java JAR, or Kubernetes Helm chart for quick setup
  • Configure STORAGE_TYPE environment variable to choose Elasticsearch, Cassandra, or MySQL
  • Set KAFKA_BOOTSTRAP_SERVERS to collect spans from Kafka instead of direct HTTP
  • Tune COLLECTOR_SAMPLE_RATE to control what percentage of traces are stored
  • Use the Zipkin Lens UI (default since Zipkin 2.x) for trace search and dependency analysis

Key Features

  • Language-agnostic instrumentation with libraries for Java, Go, Python, JS, Ruby, C#, and more
  • OpenTelemetry-compatible: accepts OTLP spans via the OpenTelemetry Collector
  • Service dependency graph auto-generated from trace data without manual configuration
  • Trace comparison view to diff two traces side-by-side for performance regression analysis
  • Low resource footprint: the server runs as a single JAR with in-memory storage for development

Comparison with Similar Tools

  • Jaeger — CNCF tracing with adaptive sampling; Zipkin has broader language support and simpler single-binary deployment
  • Tempo (Grafana) — Object-storage-backed tracing; Zipkin provides its own UI while Tempo relies on Grafana
  • AWS X-Ray — Managed tracing for AWS; Zipkin is self-hosted and vendor-neutral
  • Datadog APM — Commercial full-stack observability; Zipkin is free and open source with pluggable storage
  • SigNoz — All-in-one observability; Zipkin focuses purely on distributed tracing with maximum flexibility

FAQ

Q: How does Zipkin compare to Jaeger? A: Both solve distributed tracing. Zipkin has a longer history and wider instrumentation library support. Jaeger offers adaptive sampling and is a CNCF graduated project. Both can accept OpenTelemetry data.

Q: Can Zipkin handle high-volume production traffic? A: Yes. Use Kafka as the span transport and Elasticsearch or Cassandra as storage. Configure sampling to control volume.

Q: Does Zipkin support OpenTelemetry? A: Yes. Zipkin accepts spans from the OpenTelemetry Collector via its Zipkin exporter. Zipkin also has its own native B3 propagation format.

Q: What is the difference between Zipkin and OpenTelemetry? A: OpenTelemetry is a vendor-neutral instrumentation standard. Zipkin is a tracing backend and UI. They work together: OpenTelemetry generates spans and Zipkin stores and visualizes them.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets