Skills2026年4月14日·1 分钟阅读

Apache Druid — Real-Time Analytics Database for Event-Driven Data

Apache Druid powers interactive analytics on real-time event data. With column-oriented storage, time-based partitioning, and a distributed architecture, it serves sub-second queries on trillions of events per day — the OLAP engine behind Netflix and Airbnb.

Apache Software Foundation · Community

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项，确认后再继续。

Needs Confirmation · 64/100策略：需确认

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Community

入口

step-1.md

先审查命令

npx -y tokrepo@latest install 0963f669-37d2-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run，确认写入项后再运行此命令。

TL;DR

Apache Druid delivers sub-second OLAP queries on real-time event streams with column-oriented storage and time partitioning.

§01

What it is

Apache Druid is an open-source, distributed analytics database designed for real-time event data. It uses column-oriented storage, time-based partitioning, and a shared-nothing architecture to serve sub-second queries on large-scale event streams.

Druid targets data engineers and analytics teams who need interactive dashboards and slice-and-dice analytics on high-volume event data such as clickstreams, logs, or IoT telemetry.

§02

How it saves time or tokens

Druid ingests data in real time from Kafka, Kinesis, or batch sources and makes it queryable within seconds. Unlike traditional data warehouses that require ETL pipelines with minutes-to-hours latency, Druid provides near-instant query results on fresh data, reducing the feedback loop for operational analytics.

§03

How to use

Download and start Druid:

curl -O https://dlcdn.apache.org/druid/30.0.1/apache-druid-30.0.1-bin.tar.gz
tar -xzf apache-druid-30.0.1-bin.tar.gz
cd apache-druid-30.0.1
./bin/start-druid

Open the web console at http://localhost:8888.

Load data via the console wizard or submit an ingestion spec via the API.

§04

Example

-- Query via Druid SQL (web console or /druid/v2/sql endpoint)
SELECT
  TIME_FLOOR(__time, 'PT1H') AS hour,
  service,
  COUNT(*) AS events,
  SUM(duration_ms) AS total_duration
FROM request_logs
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1, 2
ORDER BY events DESC
LIMIT 20

§05

Related on TokRepo

Database Tools -- More database and data infrastructure tools
Monitoring Tools -- Real-time observability and analytics solutions

§06

Common pitfalls

Druid requires a minimum of 8GB RAM for the single-server quickstart. Production clusters need dedicated nodes for each Druid service (broker, coordinator, historical, middleManager).
Druid SQL covers a subset of standard SQL. Complex joins and subqueries may not be supported. Check the SQL compatibility matrix before migrating queries.
Real-time ingestion from Kafka requires careful tuning of task count and segment granularity to balance latency versus segment size.

常见问题

What data sources can Druid ingest from?+

Druid supports real-time ingestion from Apache Kafka and Amazon Kinesis, plus batch ingestion from HDFS, S3, GCS, Azure Blob, and local files. It also supports push-based ingestion via its HTTP API.

How does Druid compare to ClickHouse?+

Both are column-oriented analytics databases. Druid excels at real-time ingestion with sub-second query latency on time-series data. ClickHouse supports more SQL features and ad-hoc queries. The choice depends on whether real-time ingestion or SQL completeness matters more.

Does Druid support SQL?+

Yes. Druid provides a SQL interface via the /druid/v2/sql endpoint and the web console. It supports SELECT, WHERE, GROUP BY, ORDER BY, and common aggregation functions. Some advanced SQL features like window functions have limited support.

What is the minimum hardware for running Druid?+

The single-server quickstart requires at least 8GB RAM and runs all Druid services in one process. Production deployments typically use dedicated nodes for each service role with 16GB+ RAM per node.

Can Druid replace a traditional data warehouse?+

Druid is optimized for time-series and event analytics with real-time ingestion. It is not a general-purpose data warehouse. Use it alongside a warehouse like BigQuery or Snowflake, with Druid handling real-time operational dashboards.

引用来源 (3)

Apache Druid GitHub— Apache Druid is a real-time analytics database for event-driven data
Apache Druid Documentation— Druid architecture and documentation
Apache Druid Design— Column-oriented storage for analytical workloads

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Apache Druid — Real-Time Analytics Database for Event-Driven Data

先审查再安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

讨论

相关资产

Apache Pinot — Real-Time Distributed OLAP Datastore

Apache Hudi — Incremental Data Processing for Data Lakehouses

Apache Doris — Modern MPP Analytical Database for Real-Time Reporting

Apache Flink — Stream Processing Framework for Real-Time Data