ConfigsApr 14, 2026·3 min read

StarRocks — High-Performance Analytical Database with MySQL Protocol

StarRocks is a next-generation MPP database that delivers extreme analytical query performance on large datasets. Benchmarks frequently show it as the fastest open-source OLAP engine — with full MySQL compatibility and support for data lake queries.

Introduction

StarRocks originated as a Chinese BI startup (forked from early Apache Doris) and has become the benchmark-topping open-source OLAP engine. With a full vectorized execution engine, CBO (cost-based optimizer), and query-time MV rewriting, it frequently outperforms ClickHouse, Doris, and Presto on TPC-DS and real workloads.

With over 11,000 GitHub stars, StarRocks is used by Trip.com, Airbnb, Pinterest, and Tencent. It speaks MySQL protocol (any BI tool connects), supports federated queries across Iceberg/Hive/Hudi, and runs both self-hosted and as CelerData (the managed service).

What StarRocks Does

StarRocks ingests data into its own columnar format (native tables) or queries external lakes (Iceberg, Hive, Hudi, JDBC, object storage). Its CBO picks join orders and materialized view rewrites automatically. Vectorized execution makes full use of CPU SIMD registers. Primary Key tables allow real upserts, something most OLAP engines skimp on.

Architecture Overview

BI tools (Tableau, Superset, Looker) -> MySQL wire protocol
        |
  [FE — Frontend nodes]
   SQL parsing, CBO, metadata
   HA via BDBJE
        |
  +--------+--------+
  |        |        |
 [BE]    [BE]    [BE]
  Vectorized engine
  Columnar storage (native)
  Tablet replication
        |
  [External Catalogs]
   Iceberg, Hive, Hudi,
   JDBC, object storage
        |
  [Materialized Views]
   async refresh
   optimizer auto-rewrites matching queries

Self-Hosting & Configuration

-- Federated query: join StarRocks native table with Iceberg lake table
CREATE EXTERNAL CATALOG iceberg PROPERTIES (
  "type" = "iceberg",
  "iceberg.catalog.type" = "hive",
  "hive.metastore.uris" = "thrift://metastore:9083"
);

SELECT o.user_id, l.country, SUM(o.amount)
FROM orders o
JOIN iceberg.db.users u ON o.user_id = u.id
JOIN iceberg.db.locations l ON u.location_id = l.id
WHERE o.ts >= '2026-04-01'
GROUP BY o.user_id, l.country;

-- Stream ingest from Kafka (Routine Load)
CREATE ROUTINE LOAD orders_stream ON orders
COLUMNS (order_id, user_id, amount, status, ts)
PROPERTIES ("format" = "json", "jsonpaths" = '["$.order_id","$.user_id","$.amount","$.status","$.ts"]')
FROM KAFKA (
  "kafka_broker_list" = "kafka:9092",
  "kafka_topic" = "orders"
);

Key Features

  • Vectorized + CBO — best-in-class TPC-DS and real-world performance
  • MySQL compatible — BI tools and ORMs work unchanged
  • Materialized view rewrites — optimizer uses MVs transparently
  • Primary Key tables — fast upserts + partial updates
  • Federated queries — native + lake tables in one SQL
  • Real-time ingest — Routine Load (Kafka), Flink CDC, Stream Load
  • Storage-compute separation (3.0+) — elastic compute on object storage
  • Active development — monthly releases, fast bug-fix cadence

Comparison with Similar Tools

Feature StarRocks Doris ClickHouse Snowflake Presto/Trino
Dialect MySQL MySQL Own Snowflake SQL ANSI SQL
Upserts Yes (PK) Yes (Unique) Limited Yes No
MV rewrite Yes (async MV) Yes Manual Yes No
Federated queries Yes Yes Yes (via engines) Yes (Iceberg) Yes (focus)
Storage/compute separation Yes (3.x) Partial Limited Yes Yes (compute only)
Best For Real-time + lake OLAP Self-serve BI Raw scan speed Managed DW Federated SQL

FAQ

Q: StarRocks vs Apache Doris — same project? A: StarRocks forked from an early Doris version and diverged significantly. StarRocks usually wins performance benchmarks; Doris has the ASF governance and broader community. Try both on your workload.

Q: StarRocks vs ClickHouse? A: ClickHouse is simpler to run on a single node and often wins pure scan benchmarks. StarRocks has better concurrency, better join performance, MV rewrites, MySQL protocol, and federated lake queries.

Q: Is storage-compute separation important? A: Very. In 3.x+ StarRocks can store primary data in S3/GCS and scale compute nodes elastically. This matches Snowflake's architecture and dramatically reduces costs for spiky workloads.

Q: Is StarRocks open source? A: Yes, Apache-2.0. CelerData provides a managed/cloud version for those who prefer not to self-host.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets