Introduction
RocksDB is an embeddable persistent key-value store written in C++, forked from LevelDB at Facebook in 2012 and tuned for flash storage and server workloads. It powers MyRocks, CockroachDB, TiKV, YugabyteDB, Kafka Streams state stores and countless custom data systems, because it removes the pain of writing a durable, concurrent, crash-safe LSM engine from scratch.
What RocksDB Does
- Stores ordered byte-string keys with atomic Put/Get/Delete/Merge operations and range iteration.
- Uses a log-structured merge tree with memtables, immutable SST files and leveled/universal compaction.
- Provides column families so one process can host many independent keyspaces sharing a WAL.
- Supports snapshots, transactions (optimistic & pessimistic) and checkpoints for consistent backups.
- Exposes thousands of tuning knobs (block cache, bloom filters, rate limiter, compression) for SSD, HDD or RAM-heavy setups.
Architecture Overview
Writes enter a write-ahead log and an in-memory skiplist memtable. When memtables fill, they are flushed to sorted string tables (SSTs) on disk and organized into levels, where background compaction merges and reorders data to keep reads fast. A block cache holds hot SST blocks, bloom filters skip SSTs that cannot contain a key and rate limiters throttle I/O so compaction does not starve foreground traffic.
Self-Hosting & Configuration
- Link against librocksdb or one of the official bindings (Java, Go, Rust, Python, Node.js).
- Pick a compaction style: leveled for read-heavy, universal for write-heavy, FIFO for logs/metrics.
- Size
write_buffer_size,max_write_buffer_numberandlevel0_slowdown_writes_triggerfor your SSD bandwidth. - Enable
BlobDBwhen values are much larger than keys to avoid rewriting them during compaction. - Tune
block_cache, bloom filter bits/key and direct I/O separately for reads vs compaction.
Key Features
- Merge operators for read-modify-write-free counters and CRDT-like aggregates.
- Pessimistic and optimistic transactions with serializable snapshots.
- Online backups, incremental checkpoints and replication-friendly WAL shipping.
- Rate limiters, write stalls and I/O priority classes so compaction respects SLOs.
- Column families, prefix extractors and iterator pinning for efficient composite-key workloads.
Comparison with Similar Tools
- LevelDB — the ancestor, single-threaded and much less tunable.
- LMDB — mmap B-tree; great reads, weaker on write-heavy workloads.
- BadgerDB — pure-Go LSM, no CGO, but fewer features than RocksDB.
- WiredTiger — MongoDB's engine; hybrid B-tree/LSM with different trade-offs.
- Sled / Fjall — modern Rust embedded KVs; simpler API, smaller ecosystem.
FAQ
Q: Is RocksDB a database or a library? A: A library. It gives you an engine; you implement the server, schema and networking on top. Q: Does RocksDB support replication? A: Not built-in. Ship the WAL, use checkpoints, or build on a system like Raft (as TiKV does). Q: When should I pick LMDB instead? A: When your workload is read-dominant, fits mostly in RAM and you want zero-copy mmap reads. Q: Is it safe to run RocksDB in production on a single node? A: Yes — MyRocks and CockroachDB run it at massive scale. Tune compaction and monitor stalls.