Introduction
Apache IoTDB is a time-series database purpose-built for IoT scenarios. Developed at Tsinghua University and donated to the Apache Software Foundation, it handles high-frequency sensor data ingestion while maintaining efficient compression and query performance on resource-constrained devices.
What Apache IoTDB Does
- Ingests millions of time-series data points per second from sensors and devices
- Compresses time-series data with encoding schemes like Gorilla, RLE, and dictionary encoding
- Provides SQL-like query language (IoTDB SQL) for aggregation, downsampling, and filtering
- Supports both standalone single-node and distributed multi-node cluster deployments
- Integrates with Apache Spark, Flink, and Kafka for stream and batch processing pipelines
Architecture Overview
IoTDB uses a tree-based metadata model where devices and measurements form a hierarchical path (e.g., root.factory.device1.temperature). Data is stored in TsFiles, a columnar format optimized for time-series with per-column encoding and compression. Writes buffer in a MemTable before flushing to disk. The distributed mode uses a ConfigNode for metadata consensus and DataNodes for storage, coordinated via Raft protocol.
Self-Hosting & Configuration
- Deploy via Docker, download binary, or build from source with Maven and JDK 11+
- Configure iotdb-system.properties for memory allocation and storage directories
- Set wal_buffer_size and memtable_size_threshold to balance write throughput and memory usage
- Enable compaction strategies (cross-space, inner-space) for long-term storage efficiency
- Use ConfigNode and DataNode scripts separately for distributed cluster setup
Key Features
- Tree-structured metadata model maps naturally to IoT device hierarchies
- Time-series specific encoding achieves 10-30x compression ratios on sensor data
- Aligned timeseries feature stores multiple measurements at the same timestamp efficiently
- Trigger and continuous query mechanisms for real-time alerting and downsampling
- Edge-cloud sync allows lightweight edge instances to replicate data to central clusters
Comparison with Similar Tools
- InfluxDB — Popular time-series DB with Flux query language; IoTDB offers better compression for high-cardinality IoT data
- TimescaleDB — PostgreSQL extension for time-series; stronger SQL compatibility but requires PostgreSQL overhead
- TDengine — Also targets IoT with clustering and SQL support; IoTDB has broader Apache ecosystem integration
- QuestDB — Optimized for fast SQL analytics on time-series; less focused on IoT device hierarchy modeling
- Prometheus — Pull-based metrics collection; designed for monitoring rather than general IoT data storage
FAQ
Q: What query language does IoTDB use? A: IoTDB uses its own SQL-like dialect (IoTDB SQL) that supports time-range filters, aggregation functions, GROUP BY time intervals, and FILL clauses for missing data interpolation.
Q: Can IoTDB run on edge devices? A: Yes. The standalone mode has a small memory footprint and can run on ARM-based devices. Edge instances can sync data to a central cloud cluster.
Q: How does IoTDB handle schema? A: IoTDB supports both schema-on-write (explicitly create timeseries) and auto-creation mode where schemas are inferred from incoming data. The tree model organizes measurements hierarchically.
Q: What is the TsFile format? A: TsFile is IoTDB's native columnar file format designed for time-series data. It stores data sorted by time with per-column encoding, enabling efficient range scans and compression.