Cette page est affichée en anglais. Une traduction française est en cours.

SkillsApr 14, 2026·3 min de lecture

Apache Doris — Modern MPP Analytical Database for Real-Time Reporting

Apache Doris is a high-performance real-time analytical database. It combines MySQL-compatible SQL, sub-second query latency, and support for federated queries across data lakes, Hive, Iceberg, and Hudi — the open-source answer to Snowflake and BigQuery.

Apache Software Foundation · Community

Prêt pour agents

Installation avec revue préalable

Cet actif nécessite une revue. Le prompt copié demande un dry-run, affiche les écritures, puis continue seulement après confirmation.

Needs Confirmation · 64/100Policy : confirmer

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Community

Point d'entrée

step-1.md

Commande avec revue préalable

npx -y tokrepo@latest install 0906d4d6-37d2-11f1-9bc6-00163e2b0d79 --target codex

Dry-run d'abord, confirmez les écritures, puis lancez cette commande.

TL;DR

Apache Doris delivers sub-second analytical queries with MySQL-compatible SQL and data lake support.

§01

What it is

Apache Doris is a high-performance, real-time analytical database built for online analytical processing (OLAP). It provides MySQL-compatible SQL, sub-second query latency on large datasets, and federated queries across data lakes including Hive, Iceberg, and Hudi.

Apache Doris targets data engineers and analysts who need fast dashboards, ad-hoc reporting, and real-time analytics without the complexity of a separate ETL pipeline.

§02

How it saves time or tokens

Doris ingests data in real time and serves analytical queries immediately, eliminating the batch ETL window. You can query fresh data within seconds of ingestion. The MySQL protocol compatibility means existing BI tools (Grafana, Superset, Metabase) connect without custom drivers.

The built-in materialized views and rollup tables pre-aggregate common queries, reducing compute time for repeated dashboard requests.

§03

How to use

Deploy Doris: download the binary or use Docker
Start the Frontend (FE) and Backend (BE) nodes
Connect with any MySQL client: mysql -h 127.0.0.1 -P 9030 -u root
Create tables and load data via Stream Load or Routine Load from Kafka

§04

Example

-- Create an aggregate table for web analytics
CREATE TABLE page_views (
    event_date DATE,
    page_url VARCHAR(512),
    user_id BIGINT,
    view_count BIGINT SUM DEFAULT '0'
)
AGGREGATE KEY(event_date, page_url, user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 16
PROPERTIES ('replication_num' = '1');

-- Load data via Stream Load
-- curl -u root: -H 'format: json' -T data.json http://fe:8030/api/db/page_views/_stream_load

-- Query with standard SQL
SELECT page_url, SUM(view_count) as total_views
FROM page_views
WHERE event_date >= '2026-04-01'
GROUP BY page_url
ORDER BY total_views DESC
LIMIT 10;

§05

Related on TokRepo

Database tools -- Database management and analytics
Monitoring tools -- Dashboards and real-time monitoring

§06

Common pitfalls

Doris requires separate FE and BE processes; minimum production setup is 3 FE nodes and 3 BE nodes for high availability
Choosing the wrong data model (Aggregate, Unique, or Duplicate) affects query performance significantly; read the model guide before designing tables
Stream Load has a 10 GB default limit per request; batch large imports into smaller chunks

Questions fréquentes

How does Doris compare to ClickHouse?+

Both are columnar OLAP databases. Doris provides MySQL protocol compatibility and easier operations. ClickHouse offers more analytical functions and typically faster raw query performance. Doris is often preferred when MySQL compatibility and simpler operations matter more.

Does Doris support real-time data ingestion?+

Yes. Doris supports Stream Load for HTTP-based ingestion, Routine Load for continuous Kafka consumption, and Broker Load for batch imports from HDFS or S3. Data is queryable within seconds of ingestion.

Can Doris query data lakes?+

Yes. Doris supports federated queries across Hive, Iceberg, Hudi, and Delta Lake catalogs. You register external catalogs and query them with standard SQL alongside Doris internal tables.

What is the minimum hardware for Doris?+

A single FE and single BE node can run on a machine with 4 cores and 16 GB RAM for testing. Production deployments should have at least 3 FE nodes and 3 BE nodes with SSDs for optimal performance.

Is Apache Doris open source?+

Yes. Apache Doris is an Apache Software Foundation top-level project under the Apache 2.0 license. Commercial distributions like SelectDB offer managed hosting and enterprise support.

Sources citées (3)

Apache Doris GitHub— Apache Doris is a real-time analytical database
Apache Doris Docs— MySQL protocol compatibility for BI tool integration
Apache Doris Docs— Federated query support for Hive, Iceberg, and Hudi

En lien sur TokRepo

Database tools Monitoring tools Featured workflows

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

Apache Druid — Real-Time Analytics Database for Event-Driven Data

Apache Druid powers interactive analytics on real-time event data. With column-oriented storage, time-based partitioning, and a distributed architecture, it serves sub-second queries on trillions of events per day — the OLAP engine behind Netflix and Airbnb.

Skills

Apache Software Foundation

Apache Pinot — Real-Time Distributed OLAP Datastore

Apache Pinot is a real-time distributed OLAP datastore designed to deliver low-latency analytical queries at high throughput. It powers user-facing analytics at companies like LinkedIn, Uber, and Stripe by ingesting data from Kafka and batch sources.

Skills

Apache Software Foundation

Apache Flink — Stream Processing Framework for Real-Time Data

Apache Flink is the leading open-source framework for stateful stream processing. It processes unbounded data streams with exactly-once semantics, low latency, and high throughput — powering real-time analytics, fraud detection, and event-driven applications.

Skills

Apache Software Foundation

Apache Hudi — Incremental Data Processing for Data Lakehouses

Apache Hudi (Hadoop Upserts Deletes and Incrementals) is an open-source data lakehouse platform that provides record-level insert, update, and delete capabilities on data lakes. It powers incremental pipelines, CDC ingestion, and near-real-time analytics on S3, GCS, and HDFS.

Skills

Apache Software Foundation