Apache Druid — Real-Time Analytics Database for Event-Driven Data
Apache Druid powers interactive analytics on real-time event data. With column-oriented storage, time-based partitioning, and a distributed architecture, it serves sub-second queries on trillions of events per day — the OLAP engine behind Netflix and Airbnb.
What it is
Apache Druid is an open-source, distributed analytics database designed for real-time event data. It uses column-oriented storage, time-based partitioning, and a shared-nothing architecture to serve sub-second queries on large-scale event streams.
Druid targets data engineers and analytics teams who need interactive dashboards and slice-and-dice analytics on high-volume event data such as clickstreams, logs, or IoT telemetry.
How it saves time or tokens
Druid ingests data in real time from Kafka, Kinesis, or batch sources and makes it queryable within seconds. Unlike traditional data warehouses that require ETL pipelines with minutes-to-hours latency, Druid provides near-instant query results on fresh data, reducing the feedback loop for operational analytics.
How to use
- Download and start Druid:
curl -O https://dlcdn.apache.org/druid/30.0.1/apache-druid-30.0.1-bin.tar.gz
tar -xzf apache-druid-30.0.1-bin.tar.gz
cd apache-druid-30.0.1
./bin/start-druid
- Open the web console at
http://localhost:8888.
- Load data via the console wizard or submit an ingestion spec via the API.
Example
-- Query via Druid SQL (web console or /druid/v2/sql endpoint)
SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
service,
COUNT(*) AS events,
SUM(duration_ms) AS total_duration
FROM request_logs
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1, 2
ORDER BY events DESC
LIMIT 20
Related on TokRepo
- Database Tools -- More database and data infrastructure tools
- Monitoring Tools -- Real-time observability and analytics solutions
Common pitfalls
- Druid requires a minimum of 8GB RAM for the single-server quickstart. Production clusters need dedicated nodes for each Druid service (broker, coordinator, historical, middleManager).
- Druid SQL covers a subset of standard SQL. Complex joins and subqueries may not be supported. Check the SQL compatibility matrix before migrating queries.
- Real-time ingestion from Kafka requires careful tuning of task count and segment granularity to balance latency versus segment size.
Frequently Asked Questions
Druid supports real-time ingestion from Apache Kafka and Amazon Kinesis, plus batch ingestion from HDFS, S3, GCS, Azure Blob, and local files. It also supports push-based ingestion via its HTTP API.
Both are column-oriented analytics databases. Druid excels at real-time ingestion with sub-second query latency on time-series data. ClickHouse supports more SQL features and ad-hoc queries. The choice depends on whether real-time ingestion or SQL completeness matters more.
Yes. Druid provides a SQL interface via the /druid/v2/sql endpoint and the web console. It supports SELECT, WHERE, GROUP BY, ORDER BY, and common aggregation functions. Some advanced SQL features like window functions have limited support.
The single-server quickstart requires at least 8GB RAM and runs all Druid services in one process. Production deployments typically use dedicated nodes for each service role with 16GB+ RAM per node.
Druid is optimized for time-series and event analytics with real-time ingestion. It is not a general-purpose data warehouse. Use it alongside a warehouse like BigQuery or Snowflake, with Druid handling real-time operational dashboards.
Citations (3)
- Apache Druid GitHub— Apache Druid is a real-time analytics database for event-driven data
- Apache Druid Documentation— Druid architecture and documentation
- Apache Druid Design— Column-oriented storage for analytical workloads
Related on TokRepo
Discussion
Related Assets
Moodle — Open-Source Learning Management System
The most widely used open-source learning platform, providing course management, assessments, and collaboration tools for educators and organizations worldwide.
Sylius — Headless E-Commerce Framework on Symfony
An open-source headless e-commerce platform built on Symfony and API Platform, designed for developers who need a customizable and API-first commerce solution.
Akaunting — Free Self-Hosted Accounting Software
A free, open-source online accounting application built on Laravel for small businesses and freelancers to manage invoices, expenses, and financial reports.