Apache Doris — Modern MPP Analytical Database for Real-Time Reporting
Apache Doris is a high-performance real-time analytical database. It combines MySQL-compatible SQL, sub-second query latency, and support for federated queries across data lakes, Hive, Iceberg, and Hudi — the open-source answer to Snowflake and BigQuery.
What it is
Apache Doris is a high-performance, real-time analytical database built for online analytical processing (OLAP). It provides MySQL-compatible SQL, sub-second query latency on large datasets, and federated queries across data lakes including Hive, Iceberg, and Hudi.
Apache Doris targets data engineers and analysts who need fast dashboards, ad-hoc reporting, and real-time analytics without the complexity of a separate ETL pipeline.
How it saves time or tokens
Doris ingests data in real time and serves analytical queries immediately, eliminating the batch ETL window. You can query fresh data within seconds of ingestion. The MySQL protocol compatibility means existing BI tools (Grafana, Superset, Metabase) connect without custom drivers.
The built-in materialized views and rollup tables pre-aggregate common queries, reducing compute time for repeated dashboard requests.
How to use
- Deploy Doris: download the binary or use Docker
- Start the Frontend (FE) and Backend (BE) nodes
- Connect with any MySQL client:
mysql -h 127.0.0.1 -P 9030 -u root - Create tables and load data via Stream Load or Routine Load from Kafka
Example
-- Create an aggregate table for web analytics
CREATE TABLE page_views (
event_date DATE,
page_url VARCHAR(512),
user_id BIGINT,
view_count BIGINT SUM DEFAULT '0'
)
AGGREGATE KEY(event_date, page_url, user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 16
PROPERTIES ('replication_num' = '1');
-- Load data via Stream Load
-- curl -u root: -H 'format: json' -T data.json http://fe:8030/api/db/page_views/_stream_load
-- Query with standard SQL
SELECT page_url, SUM(view_count) as total_views
FROM page_views
WHERE event_date >= '2026-04-01'
GROUP BY page_url
ORDER BY total_views DESC
LIMIT 10;
Related on TokRepo
- Database tools -- Database management and analytics
- Monitoring tools -- Dashboards and real-time monitoring
Common pitfalls
- Doris requires separate FE and BE processes; minimum production setup is 3 FE nodes and 3 BE nodes for high availability
- Choosing the wrong data model (Aggregate, Unique, or Duplicate) affects query performance significantly; read the model guide before designing tables
- Stream Load has a 10 GB default limit per request; batch large imports into smaller chunks
Frequently Asked Questions
Both are columnar OLAP databases. Doris provides MySQL protocol compatibility and easier operations. ClickHouse offers more analytical functions and typically faster raw query performance. Doris is often preferred when MySQL compatibility and simpler operations matter more.
Yes. Doris supports Stream Load for HTTP-based ingestion, Routine Load for continuous Kafka consumption, and Broker Load for batch imports from HDFS or S3. Data is queryable within seconds of ingestion.
Yes. Doris supports federated queries across Hive, Iceberg, Hudi, and Delta Lake catalogs. You register external catalogs and query them with standard SQL alongside Doris internal tables.
A single FE and single BE node can run on a machine with 4 cores and 16 GB RAM for testing. Production deployments should have at least 3 FE nodes and 3 BE nodes with SSDs for optimal performance.
Yes. Apache Doris is an Apache Software Foundation top-level project under the Apache 2.0 license. Commercial distributions like SelectDB offer managed hosting and enterprise support.
Citations (3)
- Apache Doris GitHub— Apache Doris is a real-time analytical database
- Apache Doris Docs— MySQL protocol compatibility for BI tool integration
- Apache Doris Docs— Federated query support for Hive, Iceberg, and Hudi
Related on TokRepo
Discussion
Related Assets
HumHub — Open-Source Enterprise Social Network
A flexible, open-source social networking platform built on Yii2 for creating private communities, intranets, and collaboration spaces within organizations.
Dolibarr — Open-Source ERP & CRM for Business Management
A modular open-source ERP and CRM application written in PHP for managing contacts, invoices, orders, inventory, accounting, and more from a single web interface.
PrestaShop — Open-Source PHP E-Commerce Platform
A widely adopted open-source e-commerce platform written in PHP with a rich module marketplace, multi-language support, and a strong European user base.