StarRocks — High-Performance Analytical Database with MySQL Protocol

Introduction

StarRocks originated as a Chinese BI startup (forked from early Apache Doris) and has become the benchmark-topping open-source OLAP engine. With a full vectorized execution engine, CBO (cost-based optimizer), and query-time MV rewriting, it frequently outperforms ClickHouse, Doris, and Presto on TPC-DS and real workloads.

With over 11,000 GitHub stars, StarRocks is used by Trip.com, Airbnb, Pinterest, and Tencent. It speaks MySQL protocol (any BI tool connects), supports federated queries across Iceberg/Hive/Hudi, and runs both self-hosted and as CelerData (the managed service).

What StarRocks Does

StarRocks ingests data into its own columnar format (native tables) or queries external lakes (Iceberg, Hive, Hudi, JDBC, object storage). Its CBO picks join orders and materialized view rewrites automatically. Vectorized execution makes full use of CPU SIMD registers. Primary Key tables allow real upserts, something most OLAP engines skimp on.

Architecture Overview

BI tools (Tableau, Superset, Looker) -> MySQL wire protocol
        |
  [FE — Frontend nodes]
   SQL parsing, CBO, metadata
   HA via BDBJE
        |
  +--------+--------+
  |        |        |
 [BE]    [BE]    [BE]
  Vectorized engine
  Columnar storage (native)
  Tablet replication
        |
  [External Catalogs]
   Iceberg, Hive, Hudi,
   JDBC, object storage
        |
  [Materialized Views]
   async refresh
   optimizer auto-rewrites matching queries

Self-Hosting & Configuration

-- Federated query: join StarRocks native table with Iceberg lake table
CREATE EXTERNAL CATALOG iceberg PROPERTIES (
  "type" = "iceberg",
  "iceberg.catalog.type" = "hive",
  "hive.metastore.uris" = "thrift://metastore:9083"
);

SELECT o.user_id, l.country, SUM(o.amount)
FROM orders o
JOIN iceberg.db.users u ON o.user_id = u.id
JOIN iceberg.db.locations l ON u.location_id = l.id
WHERE o.ts >= '2026-04-01'
GROUP BY o.user_id, l.country;

-- Stream ingest from Kafka (Routine Load)
CREATE ROUTINE LOAD orders_stream ON orders
COLUMNS (order_id, user_id, amount, status, ts)
PROPERTIES ("format" = "json", "jsonpaths" = '["$.order_id","$.user_id","$.amount","$.status","$.ts"]')
FROM KAFKA (
  "kafka_broker_list" = "kafka:9092",
  "kafka_topic" = "orders"
);

Key Features

Vectorized + CBO — best-in-class TPC-DS and real-world performance
MySQL compatible — BI tools and ORMs work unchanged
Materialized view rewrites — optimizer uses MVs transparently
Primary Key tables — fast upserts + partial updates
Federated queries — native + lake tables in one SQL
Real-time ingest — Routine Load (Kafka), Flink CDC, Stream Load
Storage-compute separation (3.0+) — elastic compute on object storage
Active development — monthly releases, fast bug-fix cadence

Comparison with Similar Tools

Feature	StarRocks	Doris	ClickHouse	Snowflake	Presto/Trino
Dialect	MySQL	MySQL	Own	Snowflake SQL	ANSI SQL
Upserts	Yes (PK)	Yes (Unique)	Limited	Yes	No
MV rewrite	Yes (async MV)	Yes	Manual	Yes	No
Federated queries	Yes	Yes	Yes (via engines)	Yes (Iceberg)	Yes (focus)
Storage/compute separation	Yes (3.x)	Partial	Limited	Yes	Yes (compute only)
Best For	Real-time + lake OLAP	Self-serve BI	Raw scan speed	Managed DW	Federated SQL

FAQ

Q: StarRocks vs Apache Doris — same project? A: StarRocks forked from an early Doris version and diverged significantly. StarRocks usually wins performance benchmarks; Doris has the ASF governance and broader community. Try both on your workload.

Q: StarRocks vs ClickHouse? A: ClickHouse is simpler to run on a single node and often wins pure scan benchmarks. StarRocks has better concurrency, better join performance, MV rewrites, MySQL protocol, and federated lake queries.

Q: Is storage-compute separation important? A: Very. In 3.x+ StarRocks can store primary data in S3/GCS and scale compute nodes elastically. This matches Snowflake's architecture and dramatically reduces costs for spiky workloads.

Q: Is StarRocks open source? A: Yes, Apache-2.0. CelerData provides a managed/cloud version for those who prefer not to self-host.

Sources

GitHub: https://github.com/StarRocks/starrocks
Docs: https://docs.starrocks.io
Company: CelerData
License: Apache-2.0

StarRocks — High-Performance Analytical Database with MySQL Protocol

Introduction

What StarRocks Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

LibreTranslate — Self-Hosted Translation API with No Rate Limits

Monica — Personal Relationship Manager for Remembering What Matters

Focalboard — Open-Source Project Management Alternative to Trello and Notion