# Databend — Cloud-Native Open-Source Data Warehouse Built in Rust > Databend is a modern cloud data warehouse with separation of storage and compute on object storage. Written in Rust for extreme performance, it is a self-hostable alternative to Snowflake with full Snowflake-style SQL compatibility. ## Install Save in your project root: # Databend — Cloud-Native Rust-Based Data Warehouse ## Quick Use ```bash # Single-binary Docker for a quick test docker run -d --name databend \ -p 8000:8000 -p 3307:3307 \ datafuselabs/databend # Connect via MySQL-compatible client (port 3307) mysql -h 127.0.0.1 -P 3307 -uroot # Or HTTP REST API (port 8000) curl -u root: -XPOST http://localhost:8000/v1/query \ -H "Content-Type: application/json" \ -d '{"sql": "SELECT 1+1"}' ``` ```sql -- Tables live on object storage (S3/GCS/Azure/MinIO) CREATE TABLE events ( ts TIMESTAMP, user_id BIGINT, event STRING, payload VARIANT ); -- Ingest from S3 in one line COPY INTO events FROM 's3://my-bucket/events/' FILE_FORMAT = (TYPE = PARQUET); -- Query with standard SQL + Snowflake-style functions SELECT event, COUNT(*) FROM events WHERE ts >= '2026-04-01' GROUP BY event ORDER BY 2 DESC; ``` ## Introduction Databend is a modern analytical data warehouse built from scratch in Rust, with an architecture inspired by Snowflake: **stateless compute nodes** that read and write data in object storage. That means you pay for bytes in S3 (cheap) and scale compute up/down on demand — no local storage management. With over 9,000 GitHub stars, Databend is used by teams looking for an open-source Snowflake alternative. SQL compatibility is strong enough that many Snowflake queries move over unchanged. ## What Databend Does Databend stores tables as open-format Parquet/ORC files in object storage and uses its own metadata service (built on FoundationDB / MySQL / Postgres) for catalog. Queries run through a vectorized engine that's been heavily optimized for late materialization, predicate pushdown, and filter caching. ## Architecture Overview ``` Clients (MySQL, HTTP, ClickHouse protocol) | [Query Nodes (Databend)] stateless, scale elastically | [Meta Service] table schemas, versions, auth (FoundationDB / Postgres / MySQL as backend) | [Object Storage] S3 / GCS / Azure / OCI / MinIO / HDFS Parquet/ORC + small index files | [External Data Sources] Iceberg, Hive, CSV/JSON/TSV, Kafka, Snowflake ``` ## Self-Hosting & Configuration ```sql -- Databend "Warehouses" allow compute isolation per workload CREATE WAREHOUSE etl WITH WAREHOUSE_SIZE = 'Large'; CREATE WAREHOUSE bi WITH WAREHOUSE_SIZE = 'Medium'; -- Connect into a specific warehouse use warehouse etl; -- Streaming-style COPY + transform + MERGE (CDC pattern) COPY INTO raw_events FROM 's3://bucket/cdc/*.parquet' FORCE = true FILE_FORMAT = (TYPE = PARQUET); MERGE INTO events AS tgt USING (SELECT * FROM raw_events WHERE is_delete = false) AS src ON tgt.user_id = src.user_id AND tgt.ts = src.ts WHEN MATCHED THEN UPDATE SET tgt.event = src.event WHEN NOT MATCHED THEN INSERT (user_id, ts, event) VALUES (src.user_id, src.ts, src.event); ``` ## Key Features - **Stateless compute** — scale warehouses elastically, pay per use - **Object storage first** — S3/GCS/Azure/MinIO; data is just Parquet files - **MySQL wire protocol** — BI tools connect unchanged - **Snowflake-like SQL** — VARIANT, time travel, MERGE, COPY INTO - **Rust-native performance** — vectorized execution, modern CPU features - **Streaming ingest** — CDC from Kafka, TableStreams API - **Time travel** — query historical table versions by timestamp - **Data sharing** — cross-account table shares like Snowflake ## Comparison with Similar Tools | Feature | Databend | Snowflake | ClickHouse Cloud | BigQuery | DuckDB | |---|---|---|---|---|---| | Storage | Object storage | Proprietary | Object storage (cloud) | Proprietary | Local files | | Compute model | Elastic warehouses | Elastic warehouses | Cloud-native clusters | Serverless | Single-process | | SQL dialect | Snowflake-like | Snowflake | ClickHouse | BigQuery | DuckDB | | Self-host | Yes | No | No (self-host core) | No | N/A (embedded) | | Time travel | Yes | Yes | Limited | Yes (snapshots) | No | | Best For | Open-source Snowflake alternative | Managed DW | Log-heavy analytics | GCP shops | Single-machine analytics | ## FAQ **Q: Databend vs ClickHouse?** A: Databend is architected for cloud-native storage-compute separation; ClickHouse is a high-performance local/cluster engine. Databend's SQL and storage model are closer to Snowflake; ClickHouse is closer to columnar MPP systems. Pick Databend for S3-first workflows; ClickHouse for raw speed on local disks. **Q: How mature is Databend?** A: v1.x since 2023, active monthly releases. Production use by several Chinese and international teams. SQL surface is broad enough for real analytics workloads. **Q: Can it replace Snowflake?** A: For many mid-sized analytical workloads, yes — and you own the infrastructure. For teams deeply integrated with Snowflake's ecosystem (Snowpark, etc.), the transition is harder. **Q: Is Databend truly open source?** A: The core is Apache-2.0. Databend Cloud (managed) is a paid service. The project is actively maintained by Datafuse Labs. ## Sources - GitHub: https://github.com/databendlabs/databend - Docs: https://docs.databend.com - Company: Datafuse Labs - License: Apache-2.0 --- Source: https://tokrepo.com/en/workflows/09c00758-37d2-11f1-9bc6-00163e2b0d79 Author: AI Open Source