ConfigsApr 14, 2026·3 min read

Databend — Cloud-Native Open-Source Data Warehouse Built in Rust

Databend is a modern cloud data warehouse with separation of storage and compute on object storage. Written in Rust for extreme performance, it is a self-hostable alternative to Snowflake with full Snowflake-style SQL compatibility.

Introduction

Databend is a modern analytical data warehouse built from scratch in Rust, with an architecture inspired by Snowflake: stateless compute nodes that read and write data in object storage. That means you pay for bytes in S3 (cheap) and scale compute up/down on demand — no local storage management.

With over 9,000 GitHub stars, Databend is used by teams looking for an open-source Snowflake alternative. SQL compatibility is strong enough that many Snowflake queries move over unchanged.

What Databend Does

Databend stores tables as open-format Parquet/ORC files in object storage and uses its own metadata service (built on FoundationDB / MySQL / Postgres) for catalog. Queries run through a vectorized engine that's been heavily optimized for late materialization, predicate pushdown, and filter caching.

Architecture Overview

Clients (MySQL, HTTP, ClickHouse protocol)
        |
   [Query Nodes (Databend)]
   stateless, scale elastically
        |
   [Meta Service]
   table schemas, versions, auth
   (FoundationDB / Postgres / MySQL as backend)
        |
   [Object Storage]
   S3 / GCS / Azure / OCI / MinIO / HDFS
   Parquet/ORC + small index files
        |
   [External Data Sources]
   Iceberg, Hive, CSV/JSON/TSV, Kafka, Snowflake

Self-Hosting & Configuration

-- Databend "Warehouses" allow compute isolation per workload
CREATE WAREHOUSE etl WITH WAREHOUSE_SIZE = 'Large';
CREATE WAREHOUSE bi  WITH WAREHOUSE_SIZE = 'Medium';

-- Connect into a specific warehouse
use warehouse etl;

-- Streaming-style COPY + transform + MERGE (CDC pattern)
COPY INTO raw_events FROM 's3://bucket/cdc/*.parquet' FORCE = true
  FILE_FORMAT = (TYPE = PARQUET);

MERGE INTO events AS tgt
USING (SELECT * FROM raw_events WHERE is_delete = false) AS src
  ON tgt.user_id = src.user_id AND tgt.ts = src.ts
WHEN MATCHED THEN UPDATE SET tgt.event = src.event
WHEN NOT MATCHED THEN INSERT (user_id, ts, event) VALUES (src.user_id, src.ts, src.event);

Key Features

  • Stateless compute — scale warehouses elastically, pay per use
  • Object storage first — S3/GCS/Azure/MinIO; data is just Parquet files
  • MySQL wire protocol — BI tools connect unchanged
  • Snowflake-like SQL — VARIANT, time travel, MERGE, COPY INTO
  • Rust-native performance — vectorized execution, modern CPU features
  • Streaming ingest — CDC from Kafka, TableStreams API
  • Time travel — query historical table versions by timestamp
  • Data sharing — cross-account table shares like Snowflake

Comparison with Similar Tools

Feature Databend Snowflake ClickHouse Cloud BigQuery DuckDB
Storage Object storage Proprietary Object storage (cloud) Proprietary Local files
Compute model Elastic warehouses Elastic warehouses Cloud-native clusters Serverless Single-process
SQL dialect Snowflake-like Snowflake ClickHouse BigQuery DuckDB
Self-host Yes No No (self-host core) No N/A (embedded)
Time travel Yes Yes Limited Yes (snapshots) No
Best For Open-source Snowflake alternative Managed DW Log-heavy analytics GCP shops Single-machine analytics

FAQ

Q: Databend vs ClickHouse? A: Databend is architected for cloud-native storage-compute separation; ClickHouse is a high-performance local/cluster engine. Databend's SQL and storage model are closer to Snowflake; ClickHouse is closer to columnar MPP systems. Pick Databend for S3-first workflows; ClickHouse for raw speed on local disks.

Q: How mature is Databend? A: v1.x since 2023, active monthly releases. Production use by several Chinese and international teams. SQL surface is broad enough for real analytics workloads.

Q: Can it replace Snowflake? A: For many mid-sized analytical workloads, yes — and you own the infrastructure. For teams deeply integrated with Snowflake's ecosystem (Snowpark, etc.), the transition is harder.

Q: Is Databend truly open source? A: The core is Apache-2.0. Databend Cloud (managed) is a paid service. The project is actively maintained by Datafuse Labs.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets