Introduction
Apache Doris (originally Palo by Baidu) is the open-source MPP analytical database that rivals Snowflake and BigQuery for price and performance. It speaks MySQL wire protocol, so every BI tool and ORM works unchanged, and delivers sub-second queries over billions of rows on modest hardware.
With over 15,000 GitHub stars, Doris is used by Xiaomi, JD.com, Meituan, and more than 4,000 companies. The federated query engine also reads Hive, Iceberg, Hudi, and object storage — letting you run one SQL across your warehouse and data lake.
What Doris Does
Doris runs a classic MPP architecture: Frontend (FE) nodes handle metadata, SQL parsing, and planning; Backend (BE) nodes store data columnar and execute queries in parallel. Data is stored in its own columnar format for hot queries, and federated connectors serve queries over Hive/Iceberg tables in place.
Architecture Overview
Clients (MySQL protocol)
|
[FE Frontend]
Metadata, parser, planner, scheduler
Highly available via BDBJE Raft
|
+--------+--------+--------+
| | | |
[BE] [BE] [BE] [BE]
Columnar storage
Distributed execution
Segment files per tablet
|
[Federated Connectors]
Hive, Iceberg, Hudi,
JDBC (Postgres/MySQL/Oracle),
Elasticsearch, object storageSelf-Hosting & Configuration
-- Multi-model table types
-- Duplicate (raw event log): keep every row
-- Aggregate (pre-aggregated metrics): auto-rollup
-- Unique (primary-key style): upsert semantics
CREATE TABLE daily_stats (
dt DATE,
dim1 VARCHAR(64),
cnt BIGINT SUM,
users BITMAP BITMAP_UNION
)
AGGREGATE KEY(dt, dim1)
DISTRIBUTED BY HASH(dim1) BUCKETS 10;
-- Query Iceberg table via catalog
CREATE CATALOG iceberg PROPERTIES (
"type" = "iceberg",
"iceberg.catalog.type" = "hive",
"hive.metastore.uris" = "thrift://metastore:9083"
);
SELECT COUNT(*)
FROM iceberg.db.events
WHERE dt = '2026-04-14';Key Features
- MySQL compatible — wire protocol + syntax subset, no driver changes
- Sub-second OLAP — vectorized execution, cost-based optimizer
- Real-time ingest — Stream Load, Routine Load (Kafka), Flink CDC
- Federated queries — Hive, Iceberg, Hudi, JDBC, ES, object storage
- High availability — FE Raft replication + BE tablet replication
- Materialized views — auto-rewrite queries to use pre-aggregates
- Column-level security — row + column masking for BI tools
- Apache top-level project — neutral governance, active community
Comparison with Similar Tools
| Feature | Doris | StarRocks | ClickHouse | Apache Pinot | Druid |
|---|---|---|---|---|---|
| Dialect | MySQL SQL | MySQL SQL | Own SQL | SQL | SQL |
| Transactions | Limited | Limited | Limited | No | No |
| Federated queries | Yes | Yes | Yes | Limited | Limited |
| Concurrency | Very High | Very High | Moderate | Very High | Very High |
| Real-time ingest | Yes | Yes | Yes (async) | Yes | Yes |
| Ease of ops | Low-Moderate | Moderate | Moderate | High | High |
| Best For | Self-serve BI + data lake | Self-serve BI | Raw speed analytics | User-facing analytics | User-facing analytics |
FAQ
Q: Doris vs StarRocks — they look identical? A: They share history (StarRocks forked from Doris). Today they're independent projects. Doris has broader community governance (ASF), StarRocks has faster query performance in many benchmarks. Evaluate both with your workload.
Q: Doris vs ClickHouse? A: ClickHouse has the raw speed for single-server analytics; Doris has better high-concurrency, join-heavy, and MySQL-compatible experience. For dashboards with many users, Doris is often easier; for log analytics, ClickHouse often wins.
Q: Can Doris replace my data warehouse? A: For many mid-sized setups, yes. Doris handles ingestion, OLAP queries, and federated lake queries in one engine. For petabyte-scale custom engineering, Snowflake/BigQuery still lead.
Q: How does Doris handle upserts? A: Use the Unique Key model. Writes with the same key replace old values. Doris implements MOR (merge-on-read) and COW (copy-on-write) strategies depending on your workload.
Sources
- GitHub: https://github.com/apache/doris
- Docs: https://doris.apache.org/docs
- Foundation: Apache Software Foundation
- License: Apache-2.0