# CrateDB — Distributed SQL Database for Machine Data > CrateDB is a distributed SQL database optimized for machine data, IoT, and time-series workloads. Built on a shared-nothing architecture, it combines the familiarity of SQL with the scalability of a distributed columnar store for real-time analytics on large datasets. ## Install Save in your project root: # CrateDB — Distributed SQL Database for Machine Data ## Quick Use ```bash docker run --publish 4200:4200 --publish 5432:5432 crate:latest # Access the Admin UI at http://localhost:4200 # Connect via PostgreSQL wire protocol on port 5432 ``` ## Introduction CrateDB was built for IoT and industrial use cases where millions of sensors generate time-stamped data that needs real-time SQL analytics. It distributes data across a cluster of nodes, letting you query terabytes of machine data with standard SQL without sacrificing write throughput. ## What CrateDB Does - Executes standard SQL queries over distributed columnar storage - Ingests millions of records per second across cluster nodes - Supports full-text search via integrated Lucene-based indexing - Handles nested JSON objects and arrays as first-class column types - Provides a PostgreSQL wire protocol for compatibility with existing tools ## Architecture Overview CrateDB uses a shared-nothing architecture where each node stores a subset of the data in shards. Queries are planned by a coordinator node and executed in parallel across data nodes. Storage combines a columnar engine for analytics with an inverted index for full-text search. Cluster coordination uses a Raft-based consensus protocol for master election and metadata management. ## Self-Hosting & Configuration - Deploy via Docker, Kubernetes Helm chart, or native Linux packages - Configure cluster discovery with seed hosts in `crate.yml` - Set the number of shards and replicas per table for data distribution - Tune `indices.memory.total` and thread pool sizes based on workload - Enable SSL and authentication for production deployments ## Key Features - Standard SQL with JOINs, aggregations, and window functions on distributed data - Columnar storage with automatic indexing for fast analytical queries - Geospatial data types and queries for location-based IoT applications - Built-in Admin UI for cluster monitoring, query profiling, and management - PostgreSQL wire protocol compatibility with drivers and BI tools ## Comparison with Similar Tools - **TimescaleDB** — PostgreSQL extension for time series; CrateDB is a standalone distributed system with full-text search - **ClickHouse** — columnar analytics DB; CrateDB adds full-text search and PostgreSQL compatibility - **Elasticsearch** — search engine with analytics; CrateDB provides proper SQL and relational capabilities - **QuestDB** — high-performance time-series with SQL; CrateDB handles broader workloads with distributed joins - **InfluxDB** — purpose-built for metrics; CrateDB uses standard SQL and supports richer data types ## FAQ **Q: Is CrateDB compatible with PostgreSQL?** A: CrateDB implements the PostgreSQL wire protocol, so most PostgreSQL drivers and tools work. However, it does not support all PostgreSQL SQL features like transactions. **Q: Does CrateDB support transactions?** A: CrateDB provides atomicity at the row level but does not support multi-row ACID transactions. It is designed for analytical and append-heavy workloads. **Q: How does CrateDB handle scaling?** A: Add nodes to the cluster and CrateDB automatically rebalances shards. No manual resharding is required. **Q: Is there a managed cloud offering?** A: Yes. CrateDB Cloud provides a managed service on AWS, Azure, and GCP with automated operations. ## Sources - https://github.com/crate/crate - https://cratedb.com/docs --- Source: https://tokrepo.com/en/workflows/77df7ba5-3b64-11f1-9bc6-00163e2b0d79 Author: AI Open Source