What Cassandra Does
- Wide-column model — partition key + clustering keys + columns
- CQL query language — SQL-like declarative syntax
- Masterless peer-to-peer — all nodes equal (no single point of failure)
- Tunable consistency — per-query consistency level (ONE, QUORUM, ALL)
- Multi-DC replication — NetworkTopologyStrategy
- Lightweight transactions (LWT) — Paxos-based compare-and-set
- Materialized views — denormalized auto-maintained tables
- TTL — per-cell time-to-live
- Gossip protocol — peer state distribution
- Compaction strategies — STCS, LCS, TWCS for different workloads
Architecture
Peer-to-peer ring: each node owns a range of partition keys determined by consistent hashing. Data is replicated to N nodes. Writes go to a memtable + commit log, flushed to SSTables. Reads merge SSTables + memtable + possibly bloom filters and row cache.
Self-Hosting
# 3-node cluster
version: "3"
services:
cassandra-node1:
image: cassandra:5
environment:
CASSANDRA_CLUSTER_NAME: tokrepo
CASSANDRA_SEEDS: cassandra-node1
cassandra-node2:
image: cassandra:5
environment:
CASSANDRA_CLUSTER_NAME: tokrepo
CASSANDRA_SEEDS: cassandra-node1
cassandra-node3:
image: cassandra:5
environment:
CASSANDRA_CLUSTER_NAME: tokrepo
CASSANDRA_SEEDS: cassandra-node1Key Features
- Linear horizontal scaling
- Masterless architecture
- Tunable consistency
- Multi-DC replication
- CQL query language
- Secondary indexes
- Materialized views
- Lightweight transactions
- TTL for auto-expiration
- Battle-tested at petabyte scale
Comparison
| Database | Model | Consistency | Scale |
|---|---|---|---|
| Cassandra | Wide column | Tunable (AP) | Linear (masterless) |
| ScyllaDB | Wide column (CQL compatible) | Tunable | Linear (shard-per-core) |
| HBase | Wide column | Strong (CP) | Region servers |
| DynamoDB | Key-value + doc | Tunable | Managed |
| Bigtable | Wide column | Strong | Managed |
| MongoDB | Document | Tunable | Sharding |
常见问题 FAQ
Q: Cassandra vs ScyllaDB? A: API 完全兼容。ScyllaDB C++ 实现,shard-per-core 架构,性能数倍更好,但商业版是专有。Cassandra 是 Apache 基金会项目,生态更成熟。
Q: 适合什么场景? A: 大规模时序数据、事件日志、IoT、消息历史、推荐系统。不适合需要复杂 JOIN 或强事务的场景(没有多表 JOIN)。
Q: 数据建模原则? A: Query-driven。先确定查询模式,再设计表结构。反范式化、数据冗余是常态——每种查询一张表。
来源与致谢 Sources
- Docs: https://cassandra.apache.org/doc
- GitHub: https://github.com/apache/cassandra
- License: Apache 2.0