Introduction
CrateDB was built for IoT and industrial use cases where millions of sensors generate time-stamped data that needs real-time SQL analytics. It distributes data across a cluster of nodes, letting you query terabytes of machine data with standard SQL without sacrificing write throughput.
What CrateDB Does
- Executes standard SQL queries over distributed columnar storage
- Ingests millions of records per second across cluster nodes
- Supports full-text search via integrated Lucene-based indexing
- Handles nested JSON objects and arrays as first-class column types
- Provides a PostgreSQL wire protocol for compatibility with existing tools
Architecture Overview
CrateDB uses a shared-nothing architecture where each node stores a subset of the data in shards. Queries are planned by a coordinator node and executed in parallel across data nodes. Storage combines a columnar engine for analytics with an inverted index for full-text search. Cluster coordination uses a Raft-based consensus protocol for master election and metadata management.
Self-Hosting & Configuration
- Deploy via Docker, Kubernetes Helm chart, or native Linux packages
- Configure cluster discovery with seed hosts in
crate.yml - Set the number of shards and replicas per table for data distribution
- Tune
indices.memory.totaland thread pool sizes based on workload - Enable SSL and authentication for production deployments
Key Features
- Standard SQL with JOINs, aggregations, and window functions on distributed data
- Columnar storage with automatic indexing for fast analytical queries
- Geospatial data types and queries for location-based IoT applications
- Built-in Admin UI for cluster monitoring, query profiling, and management
- PostgreSQL wire protocol compatibility with drivers and BI tools
Comparison with Similar Tools
- TimescaleDB — PostgreSQL extension for time series; CrateDB is a standalone distributed system with full-text search
- ClickHouse — columnar analytics DB; CrateDB adds full-text search and PostgreSQL compatibility
- Elasticsearch — search engine with analytics; CrateDB provides proper SQL and relational capabilities
- QuestDB — high-performance time-series with SQL; CrateDB handles broader workloads with distributed joins
- InfluxDB — purpose-built for metrics; CrateDB uses standard SQL and supports richer data types
FAQ
Q: Is CrateDB compatible with PostgreSQL? A: CrateDB implements the PostgreSQL wire protocol, so most PostgreSQL drivers and tools work. However, it does not support all PostgreSQL SQL features like transactions.
Q: Does CrateDB support transactions? A: CrateDB provides atomicity at the row level but does not support multi-row ACID transactions. It is designed for analytical and append-heavy workloads.
Q: How does CrateDB handle scaling? A: Add nodes to the cluster and CrateDB automatically rebalances shards. No manual resharding is required.
Q: Is there a managed cloud offering? A: Yes. CrateDB Cloud provides a managed service on AWS, Azure, and GCP with automated operations.