# CrateDB — Distributed SQL Database for Machine Data

> CrateDB is a distributed SQL database optimized for machine data, IoT, and time-series workloads. Built on a shared-nothing architecture, it combines the familiarity of SQL with the scalability of a distributed columnar store for real-time analytics on large datasets.

## Install

Save in your project root:

# CrateDB — Distributed SQL Database for Machine Data

## Quick Use
```bash
docker run --publish 4200:4200 --publish 5432:5432 crate:latest
# Access the Admin UI at http://localhost:4200
# Connect via PostgreSQL wire protocol on port 5432
```

## Introduction
CrateDB was built for IoT and industrial use cases where millions of sensors generate time-stamped data that needs real-time SQL analytics. It distributes data across a cluster of nodes, letting you query terabytes of machine data with standard SQL without sacrificing write throughput.

## What CrateDB Does
- Executes standard SQL queries over distributed columnar storage
- Ingests millions of records per second across cluster nodes
- Supports full-text search via integrated Lucene-based indexing
- Handles nested JSON objects and arrays as first-class column types
- Provides a PostgreSQL wire protocol for compatibility with existing tools

## Architecture Overview
CrateDB uses a shared-nothing architecture where each node stores a subset of the data in shards. Queries are planned by a coordinator node and executed in parallel across data nodes. Storage combines a columnar engine for analytics with an inverted index for full-text search. Cluster coordination uses a Raft-based consensus protocol for master election and metadata management.

## Self-Hosting & Configuration
- Deploy via Docker, Kubernetes Helm chart, or native Linux packages
- Configure cluster discovery with seed hosts in `crate.yml`
- Set the number of shards and replicas per table for data distribution
- Tune `indices.memory.total` and thread pool sizes based on workload
- Enable SSL and authentication for production deployments

## Key Features
- Standard SQL with JOINs, aggregations, and window functions on distributed data
- Columnar storage with automatic indexing for fast analytical queries
- Geospatial data types and queries for location-based IoT applications
- Built-in Admin UI for cluster monitoring, query profiling, and management
- PostgreSQL wire protocol compatibility with drivers and BI tools

## Comparison with Similar Tools
- **TimescaleDB** — PostgreSQL extension for time series; CrateDB is a standalone distributed system with full-text search
- **ClickHouse** — columnar analytics DB; CrateDB adds full-text search and PostgreSQL compatibility
- **Elasticsearch** — search engine with analytics; CrateDB provides proper SQL and relational capabilities
- **QuestDB** — high-performance time-series with SQL; CrateDB handles broader workloads with distributed joins
- **InfluxDB** — purpose-built for metrics; CrateDB uses standard SQL and supports richer data types

## FAQ
**Q: Is CrateDB compatible with PostgreSQL?**
A: CrateDB implements the PostgreSQL wire protocol, so most PostgreSQL drivers and tools work. However, it does not support all PostgreSQL SQL features like transactions.

**Q: Does CrateDB support transactions?**
A: CrateDB provides atomicity at the row level but does not support multi-row ACID transactions. It is designed for analytical and append-heavy workloads.

**Q: How does CrateDB handle scaling?**
A: Add nodes to the cluster and CrateDB automatically rebalances shards. No manual resharding is required.

**Q: Is there a managed cloud offering?**
A: Yes. CrateDB Cloud provides a managed service on AWS, Azure, and GCP with automated operations.

## Sources
- https://github.com/crate/crate
- https://cratedb.com/docs

---
Source: https://tokrepo.com/en/workflows/77df7ba5-3b64-11f1-9bc6-00163e2b0d79
Author: AI Open Source