Grafana Tempo — Massively Scalable Distributed Tracing Backend

Introduction

Most tracing backends require a search index (Elasticsearch, Cassandra) sized nearly as large as the data itself, making full-fidelity tracing prohibitively expensive. Tempo''s insight is that 99% of trace lookups are "fetch this trace by its ID" — the ID is usually already in your logs or exemplar metrics. Tempo therefore writes traces directly to object storage (S3/GCS/Azure Blob) without an index, and relies on Grafana to jump from a log line or Prometheus exemplar straight into the exact trace. It also ships TraceQL for full-text-style querying via a compute tier.

What Tempo Does

Ingests spans via OTLP, Jaeger, Zipkin, OpenCensus, and Kafka.
Stores traces as compressed Parquet blocks in object storage with a compactor behind them.
Looks up traces by ID in single-digit seconds at petabyte scale.
Executes TraceQL queries for finding traces matching structural and attribute conditions.
Generates service graphs and metrics from spans via the metrics-generator.

Architecture Overview

Tempo is a microservice architecture with distinct roles — distributor, ingester, compactor, querier, query-frontend, and metrics-generator — packed into a single binary that can be run monolithically or horizontally scaled via the Helm chart. Distributors accept spans and hash-route by trace ID. Ingesters buffer in memory and on WAL, then flush Parquet blocks to object storage. Compactors merge and deduplicate blocks. Queriers fan out to ingesters and object storage for reads. The metrics-generator derives service graph and span metrics in real time and remote-writes them to Prometheus/Mimir.

Self-Hosting & Configuration

Install via the tempo-distributed Helm chart for HA; use tempo (monolithic) for small deployments.
Back with S3/GCS/Azure Blob/MinIO — local disk is only for WAL.
Tune ingester.max_block_duration and compactor.compaction_window for your ingest rate.
Enable the metrics-generator with remote_write pointed at Prometheus/Mimir for service graphs.
Use multi-tenancy via X-Scope-OrgID headers to isolate teams or clusters.

Key Features

Object-storage-native — no separate search index means dramatically lower cost per span.
TraceQL query language with filters on resource, span, event, and duration.
Native integration with Grafana for logs-to-traces and metrics-to-traces exemplar jumps.
Metrics generator for RED metrics and service graph without running a separate agent.
Multi-tenant by design with per-tenant limits and retention.

Comparison with Similar Tools

Jaeger — Mature UI and collector, but requires Elasticsearch/Cassandra for storage at scale.
Zipkin — Simple and lightweight, but limited query and scaling story.
SigNoz — Full APM with ClickHouse storage; heavier than Tempo but includes UI.
Datadog APM — Fully managed with deep analytics; Tempo is OSS and object-storage-cheap.
AWS X-Ray / GCP Cloud Trace — Cloud-locked; Tempo is portable across clouds.

Grafana Integration

Data source plugin ships in Grafana core; configure the Tempo URL and tenant header.
Trace-to-logs and trace-to-metrics links let you pivot between signals in one pane.
Exemplars from Prometheus (or Mimir) carry trace IDs that deep-link straight into Tempo.

FAQ

Q: How is Tempo cheaper than Jaeger + ES? A: Object storage (S3) is an order of magnitude cheaper than indexed storage; Tempo keeps only traces there.

Q: Can I search by attribute without a trace ID? A: Yes — TraceQL supports structural queries; Parquet columnar blocks make this fast, though not as instantaneous as ID lookup.

Q: Does Tempo support OpenTelemetry? A: First-class — OTLP is the recommended protocol for both ingest and exemplar correlation.

Q: How long can I retain traces? A: As long as your object store bill allows; retention is configured per tenant via the compactor.

Grafana Tempo — Massively Scalable Distributed Tracing Backend

Introduction

What Tempo Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

Grafana Integration

FAQ

Sources

Discussion

Related Assets

Grafana Alloy — OpenTelemetry Collector Distribution by Grafana

Grafana OnCall — Open Source Incident Response and On-Call Management

Rundeck — Open Source Runbook Automation and Job Scheduler