Configs2026年4月14日·1 分钟阅读

Apache Doris — Modern MPP Analytical Database for Real-Time Reporting

Apache Doris is a high-performance real-time analytical database. It combines MySQL-compatible SQL, sub-second query latency, and support for federated queries across data lakes, Hive, Iceberg, and Hudi — the open-source answer to Snowflake and BigQuery.

Introduction

Apache Doris (originally Palo by Baidu) is the open-source MPP analytical database that rivals Snowflake and BigQuery for price and performance. It speaks MySQL wire protocol, so every BI tool and ORM works unchanged, and delivers sub-second queries over billions of rows on modest hardware.

With over 15,000 GitHub stars, Doris is used by Xiaomi, JD.com, Meituan, and more than 4,000 companies. The federated query engine also reads Hive, Iceberg, Hudi, and object storage — letting you run one SQL across your warehouse and data lake.

What Doris Does

Doris runs a classic MPP architecture: Frontend (FE) nodes handle metadata, SQL parsing, and planning; Backend (BE) nodes store data columnar and execute queries in parallel. Data is stored in its own columnar format for hot queries, and federated connectors serve queries over Hive/Iceberg tables in place.

Architecture Overview

Clients (MySQL protocol)
        |
  [FE Frontend]
   Metadata, parser, planner, scheduler
   Highly available via BDBJE Raft
        |
  +--------+--------+--------+
  |        |        |        |
 [BE]    [BE]    [BE]    [BE]
  Columnar storage
  Distributed execution
   Segment files per tablet
        |
  [Federated Connectors]
   Hive, Iceberg, Hudi,
   JDBC (Postgres/MySQL/Oracle),
   Elasticsearch, object storage

Self-Hosting & Configuration

-- Multi-model table types
-- Duplicate (raw event log): keep every row
-- Aggregate (pre-aggregated metrics): auto-rollup
-- Unique (primary-key style): upsert semantics
CREATE TABLE daily_stats (
  dt DATE,
  dim1 VARCHAR(64),
  cnt BIGINT    SUM,
  users BITMAP  BITMAP_UNION
)
AGGREGATE KEY(dt, dim1)
DISTRIBUTED BY HASH(dim1) BUCKETS 10;

-- Query Iceberg table via catalog
CREATE CATALOG iceberg PROPERTIES (
  "type" = "iceberg",
  "iceberg.catalog.type" = "hive",
  "hive.metastore.uris" = "thrift://metastore:9083"
);

SELECT COUNT(*)
FROM iceberg.db.events
WHERE dt = '2026-04-14';

Key Features

  • MySQL compatible — wire protocol + syntax subset, no driver changes
  • Sub-second OLAP — vectorized execution, cost-based optimizer
  • Real-time ingest — Stream Load, Routine Load (Kafka), Flink CDC
  • Federated queries — Hive, Iceberg, Hudi, JDBC, ES, object storage
  • High availability — FE Raft replication + BE tablet replication
  • Materialized views — auto-rewrite queries to use pre-aggregates
  • Column-level security — row + column masking for BI tools
  • Apache top-level project — neutral governance, active community

Comparison with Similar Tools

Feature Doris StarRocks ClickHouse Apache Pinot Druid
Dialect MySQL SQL MySQL SQL Own SQL SQL SQL
Transactions Limited Limited Limited No No
Federated queries Yes Yes Yes Limited Limited
Concurrency Very High Very High Moderate Very High Very High
Real-time ingest Yes Yes Yes (async) Yes Yes
Ease of ops Low-Moderate Moderate Moderate High High
Best For Self-serve BI + data lake Self-serve BI Raw speed analytics User-facing analytics User-facing analytics

FAQ

Q: Doris vs StarRocks — they look identical? A: They share history (StarRocks forked from Doris). Today they're independent projects. Doris has broader community governance (ASF), StarRocks has faster query performance in many benchmarks. Evaluate both with your workload.

Q: Doris vs ClickHouse? A: ClickHouse has the raw speed for single-server analytics; Doris has better high-concurrency, join-heavy, and MySQL-compatible experience. For dashboards with many users, Doris is often easier; for log analytics, ClickHouse often wins.

Q: Can Doris replace my data warehouse? A: For many mid-sized setups, yes. Doris handles ingestion, OLAP queries, and federated lake queries in one engine. For petabyte-scale custom engineering, Snowflake/BigQuery still lead.

Q: How does Doris handle upserts? A: Use the Unique Key model. Writes with the same key replace old values. Doris implements MOR (merge-on-read) and COW (copy-on-write) strategies depending on your workload.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产