ScriptsApr 21, 2026·3 min read

Apache IoTDB — Time-Series Database for Internet of Things

Lightweight time-series database designed for high-throughput IoT data ingestion with SQL-like query support, built to run on both edge devices and cloud clusters.

Introduction

Apache IoTDB is a time-series database purpose-built for IoT scenarios. Developed at Tsinghua University and donated to the Apache Software Foundation, it handles high-frequency sensor data ingestion while maintaining efficient compression and query performance on resource-constrained devices.

What Apache IoTDB Does

  • Ingests millions of time-series data points per second from sensors and devices
  • Compresses time-series data with encoding schemes like Gorilla, RLE, and dictionary encoding
  • Provides SQL-like query language (IoTDB SQL) for aggregation, downsampling, and filtering
  • Supports both standalone single-node and distributed multi-node cluster deployments
  • Integrates with Apache Spark, Flink, and Kafka for stream and batch processing pipelines

Architecture Overview

IoTDB uses a tree-based metadata model where devices and measurements form a hierarchical path (e.g., root.factory.device1.temperature). Data is stored in TsFiles, a columnar format optimized for time-series with per-column encoding and compression. Writes buffer in a MemTable before flushing to disk. The distributed mode uses a ConfigNode for metadata consensus and DataNodes for storage, coordinated via Raft protocol.

Self-Hosting & Configuration

  • Deploy via Docker, download binary, or build from source with Maven and JDK 11+
  • Configure iotdb-system.properties for memory allocation and storage directories
  • Set wal_buffer_size and memtable_size_threshold to balance write throughput and memory usage
  • Enable compaction strategies (cross-space, inner-space) for long-term storage efficiency
  • Use ConfigNode and DataNode scripts separately for distributed cluster setup

Key Features

  • Tree-structured metadata model maps naturally to IoT device hierarchies
  • Time-series specific encoding achieves 10-30x compression ratios on sensor data
  • Aligned timeseries feature stores multiple measurements at the same timestamp efficiently
  • Trigger and continuous query mechanisms for real-time alerting and downsampling
  • Edge-cloud sync allows lightweight edge instances to replicate data to central clusters

Comparison with Similar Tools

  • InfluxDB — Popular time-series DB with Flux query language; IoTDB offers better compression for high-cardinality IoT data
  • TimescaleDB — PostgreSQL extension for time-series; stronger SQL compatibility but requires PostgreSQL overhead
  • TDengine — Also targets IoT with clustering and SQL support; IoTDB has broader Apache ecosystem integration
  • QuestDB — Optimized for fast SQL analytics on time-series; less focused on IoT device hierarchy modeling
  • Prometheus — Pull-based metrics collection; designed for monitoring rather than general IoT data storage

FAQ

Q: What query language does IoTDB use? A: IoTDB uses its own SQL-like dialect (IoTDB SQL) that supports time-range filters, aggregation functions, GROUP BY time intervals, and FILL clauses for missing data interpolation.

Q: Can IoTDB run on edge devices? A: Yes. The standalone mode has a small memory footprint and can run on ARM-based devices. Edge instances can sync data to a central cloud cluster.

Q: How does IoTDB handle schema? A: IoTDB supports both schema-on-write (explicitly create timeseries) and auto-creation mode where schemas are inferred from incoming data. The tree model organizes measurements hierarchically.

Q: What is the TsFile format? A: TsFile is IoTDB's native columnar file format designed for time-series data. It stores data sorted by time with per-column encoding, enabling efficient range scans and compression.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets