Introduction
Databend is a modern analytical data warehouse built from scratch in Rust, with an architecture inspired by Snowflake: stateless compute nodes that read and write data in object storage. That means you pay for bytes in S3 (cheap) and scale compute up/down on demand — no local storage management.
With over 9,000 GitHub stars, Databend is used by teams looking for an open-source Snowflake alternative. SQL compatibility is strong enough that many Snowflake queries move over unchanged.
What Databend Does
Databend stores tables as open-format Parquet/ORC files in object storage and uses its own metadata service (built on FoundationDB / MySQL / Postgres) for catalog. Queries run through a vectorized engine that's been heavily optimized for late materialization, predicate pushdown, and filter caching.
Architecture Overview
Clients (MySQL, HTTP, ClickHouse protocol)
|
[Query Nodes (Databend)]
stateless, scale elastically
|
[Meta Service]
table schemas, versions, auth
(FoundationDB / Postgres / MySQL as backend)
|
[Object Storage]
S3 / GCS / Azure / OCI / MinIO / HDFS
Parquet/ORC + small index files
|
[External Data Sources]
Iceberg, Hive, CSV/JSON/TSV, Kafka, SnowflakeSelf-Hosting & Configuration
-- Databend "Warehouses" allow compute isolation per workload
CREATE WAREHOUSE etl WITH WAREHOUSE_SIZE = 'Large';
CREATE WAREHOUSE bi WITH WAREHOUSE_SIZE = 'Medium';
-- Connect into a specific warehouse
use warehouse etl;
-- Streaming-style COPY + transform + MERGE (CDC pattern)
COPY INTO raw_events FROM 's3://bucket/cdc/*.parquet' FORCE = true
FILE_FORMAT = (TYPE = PARQUET);
MERGE INTO events AS tgt
USING (SELECT * FROM raw_events WHERE is_delete = false) AS src
ON tgt.user_id = src.user_id AND tgt.ts = src.ts
WHEN MATCHED THEN UPDATE SET tgt.event = src.event
WHEN NOT MATCHED THEN INSERT (user_id, ts, event) VALUES (src.user_id, src.ts, src.event);Key Features
- Stateless compute — scale warehouses elastically, pay per use
- Object storage first — S3/GCS/Azure/MinIO; data is just Parquet files
- MySQL wire protocol — BI tools connect unchanged
- Snowflake-like SQL — VARIANT, time travel, MERGE, COPY INTO
- Rust-native performance — vectorized execution, modern CPU features
- Streaming ingest — CDC from Kafka, TableStreams API
- Time travel — query historical table versions by timestamp
- Data sharing — cross-account table shares like Snowflake
Comparison with Similar Tools
| Feature | Databend | Snowflake | ClickHouse Cloud | BigQuery | DuckDB |
|---|---|---|---|---|---|
| Storage | Object storage | Proprietary | Object storage (cloud) | Proprietary | Local files |
| Compute model | Elastic warehouses | Elastic warehouses | Cloud-native clusters | Serverless | Single-process |
| SQL dialect | Snowflake-like | Snowflake | ClickHouse | BigQuery | DuckDB |
| Self-host | Yes | No | No (self-host core) | No | N/A (embedded) |
| Time travel | Yes | Yes | Limited | Yes (snapshots) | No |
| Best For | Open-source Snowflake alternative | Managed DW | Log-heavy analytics | GCP shops | Single-machine analytics |
FAQ
Q: Databend vs ClickHouse? A: Databend is architected for cloud-native storage-compute separation; ClickHouse is a high-performance local/cluster engine. Databend's SQL and storage model are closer to Snowflake; ClickHouse is closer to columnar MPP systems. Pick Databend for S3-first workflows; ClickHouse for raw speed on local disks.
Q: How mature is Databend? A: v1.x since 2023, active monthly releases. Production use by several Chinese and international teams. SQL surface is broad enough for real analytics workloads.
Q: Can it replace Snowflake? A: For many mid-sized analytical workloads, yes — and you own the infrastructure. For teams deeply integrated with Snowflake's ecosystem (Snowpark, etc.), the transition is harder.
Q: Is Databend truly open source? A: The core is Apache-2.0. Databend Cloud (managed) is a paid service. The project is actively maintained by Datafuse Labs.
Sources
- GitHub: https://github.com/databendlabs/databend
- Docs: https://docs.databend.com
- Company: Datafuse Labs
- License: Apache-2.0