Configs2026年4月18日·1 分钟阅读

CrateDB — Distributed SQL Database for Machine Data

CrateDB is a distributed SQL database optimized for machine data, IoT, and time-series workloads. Built on a shared-nothing architecture, it combines the familiarity of SQL with the scalability of a distributed columnar store for real-time analytics on large datasets.

Introduction

CrateDB was built for IoT and industrial use cases where millions of sensors generate time-stamped data that needs real-time SQL analytics. It distributes data across a cluster of nodes, letting you query terabytes of machine data with standard SQL without sacrificing write throughput.

What CrateDB Does

  • Executes standard SQL queries over distributed columnar storage
  • Ingests millions of records per second across cluster nodes
  • Supports full-text search via integrated Lucene-based indexing
  • Handles nested JSON objects and arrays as first-class column types
  • Provides a PostgreSQL wire protocol for compatibility with existing tools

Architecture Overview

CrateDB uses a shared-nothing architecture where each node stores a subset of the data in shards. Queries are planned by a coordinator node and executed in parallel across data nodes. Storage combines a columnar engine for analytics with an inverted index for full-text search. Cluster coordination uses a Raft-based consensus protocol for master election and metadata management.

Self-Hosting & Configuration

  • Deploy via Docker, Kubernetes Helm chart, or native Linux packages
  • Configure cluster discovery with seed hosts in crate.yml
  • Set the number of shards and replicas per table for data distribution
  • Tune indices.memory.total and thread pool sizes based on workload
  • Enable SSL and authentication for production deployments

Key Features

  • Standard SQL with JOINs, aggregations, and window functions on distributed data
  • Columnar storage with automatic indexing for fast analytical queries
  • Geospatial data types and queries for location-based IoT applications
  • Built-in Admin UI for cluster monitoring, query profiling, and management
  • PostgreSQL wire protocol compatibility with drivers and BI tools

Comparison with Similar Tools

  • TimescaleDB — PostgreSQL extension for time series; CrateDB is a standalone distributed system with full-text search
  • ClickHouse — columnar analytics DB; CrateDB adds full-text search and PostgreSQL compatibility
  • Elasticsearch — search engine with analytics; CrateDB provides proper SQL and relational capabilities
  • QuestDB — high-performance time-series with SQL; CrateDB handles broader workloads with distributed joins
  • InfluxDB — purpose-built for metrics; CrateDB uses standard SQL and supports richer data types

FAQ

Q: Is CrateDB compatible with PostgreSQL? A: CrateDB implements the PostgreSQL wire protocol, so most PostgreSQL drivers and tools work. However, it does not support all PostgreSQL SQL features like transactions.

Q: Does CrateDB support transactions? A: CrateDB provides atomicity at the row level but does not support multi-row ACID transactions. It is designed for analytical and append-heavy workloads.

Q: How does CrateDB handle scaling? A: Add nodes to the cluster and CrateDB automatically rebalances shards. No manual resharding is required.

Q: Is there a managed cloud offering? A: Yes. CrateDB Cloud provides a managed service on AWS, Azure, and GCP with automated operations.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产