SkillsApr 14, 2026·3 min read

Apache Druid — Real-Time Analytics Database for Event-Driven Data

Apache Druid powers interactive analytics on real-time event data. With column-oriented storage, time-based partitioning, and a distributed architecture, it serves sub-second queries on trillions of events per day — the OLAP engine behind Netflix and Airbnb.

Apache Software Foundation · Community

Agent ready

Review-first install path

This asset needs a review step. The copied prompt tells the agent to dry-run, show the writes, then proceed only after confirmation.

Needs Confirmation · 64/100Policy: confirm

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Community

Entrypoint

step-1.md

Review-first command

npx -y tokrepo@latest install 0963f669-37d2-11f1-9bc6-00163e2b0d79 --target codex

Dry-run first, confirm the writes, then run this command.

TL;DR

Apache Druid delivers sub-second OLAP queries on real-time event streams with column-oriented storage and time partitioning.

§01

What it is

Apache Druid is an open-source, distributed analytics database designed for real-time event data. It uses column-oriented storage, time-based partitioning, and a shared-nothing architecture to serve sub-second queries on large-scale event streams.

Druid targets data engineers and analytics teams who need interactive dashboards and slice-and-dice analytics on high-volume event data such as clickstreams, logs, or IoT telemetry.

§02

How it saves time or tokens

Druid ingests data in real time from Kafka, Kinesis, or batch sources and makes it queryable within seconds. Unlike traditional data warehouses that require ETL pipelines with minutes-to-hours latency, Druid provides near-instant query results on fresh data, reducing the feedback loop for operational analytics.

§03

How to use

Download and start Druid:

curl -O https://dlcdn.apache.org/druid/30.0.1/apache-druid-30.0.1-bin.tar.gz
tar -xzf apache-druid-30.0.1-bin.tar.gz
cd apache-druid-30.0.1
./bin/start-druid

Open the web console at http://localhost:8888.

Load data via the console wizard or submit an ingestion spec via the API.

§04

Example

-- Query via Druid SQL (web console or /druid/v2/sql endpoint)
SELECT
  TIME_FLOOR(__time, 'PT1H') AS hour,
  service,
  COUNT(*) AS events,
  SUM(duration_ms) AS total_duration
FROM request_logs
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1, 2
ORDER BY events DESC
LIMIT 20

§05

Related on TokRepo

Database Tools -- More database and data infrastructure tools
Monitoring Tools -- Real-time observability and analytics solutions

§06

Common pitfalls

Druid requires a minimum of 8GB RAM for the single-server quickstart. Production clusters need dedicated nodes for each Druid service (broker, coordinator, historical, middleManager).
Druid SQL covers a subset of standard SQL. Complex joins and subqueries may not be supported. Check the SQL compatibility matrix before migrating queries.
Real-time ingestion from Kafka requires careful tuning of task count and segment granularity to balance latency versus segment size.

Frequently Asked Questions

What data sources can Druid ingest from?+

Druid supports real-time ingestion from Apache Kafka and Amazon Kinesis, plus batch ingestion from HDFS, S3, GCS, Azure Blob, and local files. It also supports push-based ingestion via its HTTP API.

How does Druid compare to ClickHouse?+

Both are column-oriented analytics databases. Druid excels at real-time ingestion with sub-second query latency on time-series data. ClickHouse supports more SQL features and ad-hoc queries. The choice depends on whether real-time ingestion or SQL completeness matters more.

Does Druid support SQL?+

Yes. Druid provides a SQL interface via the /druid/v2/sql endpoint and the web console. It supports SELECT, WHERE, GROUP BY, ORDER BY, and common aggregation functions. Some advanced SQL features like window functions have limited support.

What is the minimum hardware for running Druid?+

The single-server quickstart requires at least 8GB RAM and runs all Druid services in one process. Production deployments typically use dedicated nodes for each service role with 16GB+ RAM per node.

Can Druid replace a traditional data warehouse?+

Druid is optimized for time-series and event analytics with real-time ingestion. It is not a general-purpose data warehouse. Use it alongside a warehouse like BigQuery or Snowflake, with Druid handling real-time operational dashboards.

Citations (3)

Apache Druid GitHub— Apache Druid is a real-time analytics database for event-driven data
Apache Druid Documentation— Druid architecture and documentation
Apache Druid Design— Column-oriented storage for analytical workloads

Related on TokRepo

Database tools Monitoring tools DevOps tools

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Apache Pinot — Real-Time Distributed OLAP Datastore

Apache Pinot is a real-time distributed OLAP datastore designed to deliver low-latency analytical queries at high throughput. It powers user-facing analytics at companies like LinkedIn, Uber, and Stripe by ingesting data from Kafka and batch sources.

Skills

Apache Software Foundation

Apache Hudi — Incremental Data Processing for Data Lakehouses

Apache Hudi (Hadoop Upserts Deletes and Incrementals) is an open-source data lakehouse platform that provides record-level insert, update, and delete capabilities on data lakes. It powers incremental pipelines, CDC ingestion, and near-real-time analytics on S3, GCS, and HDFS.

Skills

Apache Software Foundation

Apache Doris — Modern MPP Analytical Database for Real-Time Reporting

Apache Doris is a high-performance real-time analytical database. It combines MySQL-compatible SQL, sub-second query latency, and support for federated queries across data lakes, Hive, Iceberg, and Hudi — the open-source answer to Snowflake and BigQuery.

Skills

Apache Software Foundation

Apache Flink — Stream Processing Framework for Real-Time Data

Apache Flink is the leading open-source framework for stateful stream processing. It processes unbounded data streams with exactly-once semantics, low latency, and high throughput — powering real-time analytics, fraud detection, and event-driven applications.

Skills

Apache Software Foundation