# Apache Cassandra — Distributed Wide-Column Database at Scale > Apache Cassandra is an open-source, distributed, wide-column NoSQL database. Linear scalability and proven fault-tolerance on commodity hardware. Used at Netflix, Apple, Instagram, and eBay for petabyte-scale workloads with high availability. ## Install Save as a script file and run: ## Quick Use ```bash # Docker docker run -d --name cassandra -p 9042:9042 cassandra:5 # CQL shell docker exec -it cassandra cqlsh ``` CQL usage: ```sql CREATE KEYSPACE tokrepo WITH REPLICATION = { "class": "SimpleStrategy", "replication_factor": 1 }; USE tokrepo; CREATE TABLE events_by_user ( user_id uuid, event_time timestamp, event_type text, payload text, PRIMARY KEY (user_id, event_time) ) WITH CLUSTERING ORDER BY (event_time DESC); INSERT INTO events_by_user (user_id, event_time, event_type, payload) VALUES (uuid(), toTimestamp(now()), "page_view", "/home"); -- Time-ordered query SELECT * FROM events_by_user WHERE user_id = 9b3a... AND event_time > "2026-01-01"; -- TTL (auto-delete after 7 days) INSERT INTO events_by_user (user_id, event_time, event_type) VALUES (uuid(), toTimestamp(now()), "login") USING TTL 604800; ``` ## Intro Apache Cassandra is a free and open-source, distributed, wide-column NoSQL database. Originally developed at Facebook in 2008 to handle inbox search, open-sourced in 2009, and graduated to Apache top-level in 2010. Built to provide scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure. - **Repo**: https://github.com/apache/cassandra - **Stars**: 9K+ - **Language**: Java - **License**: Apache 2.0 ## What Cassandra Does - **Wide-column model** — partition key + clustering keys + columns - **CQL query language** — SQL-like declarative syntax - **Masterless peer-to-peer** — all nodes equal (no single point of failure) - **Tunable consistency** — per-query consistency level (ONE, QUORUM, ALL) - **Multi-DC replication** — NetworkTopologyStrategy - **Lightweight transactions (LWT)** — Paxos-based compare-and-set - **Materialized views** — denormalized auto-maintained tables - **TTL** — per-cell time-to-live - **Gossip protocol** — peer state distribution - **Compaction strategies** — STCS, LCS, TWCS for different workloads ## Architecture Peer-to-peer ring: each node owns a range of partition keys determined by consistent hashing. Data is replicated to N nodes. Writes go to a memtable + commit log, flushed to SSTables. Reads merge SSTables + memtable + possibly bloom filters and row cache. ## Self-Hosting ```yaml # 3-node cluster version: "3" services: cassandra-node1: image: cassandra:5 environment: CASSANDRA_CLUSTER_NAME: tokrepo CASSANDRA_SEEDS: cassandra-node1 cassandra-node2: image: cassandra:5 environment: CASSANDRA_CLUSTER_NAME: tokrepo CASSANDRA_SEEDS: cassandra-node1 cassandra-node3: image: cassandra:5 environment: CASSANDRA_CLUSTER_NAME: tokrepo CASSANDRA_SEEDS: cassandra-node1 ``` ## Key Features - Linear horizontal scaling - Masterless architecture - Tunable consistency - Multi-DC replication - CQL query language - Secondary indexes - Materialized views - Lightweight transactions - TTL for auto-expiration - Battle-tested at petabyte scale ## Comparison | Database | Model | Consistency | Scale | |---|---|---|---| | Cassandra | Wide column | Tunable (AP) | Linear (masterless) | | ScyllaDB | Wide column (CQL compatible) | Tunable | Linear (shard-per-core) | | HBase | Wide column | Strong (CP) | Region servers | | DynamoDB | Key-value + doc | Tunable | Managed | | Bigtable | Wide column | Strong | Managed | | MongoDB | Document | Tunable | Sharding | ## 常见问题 FAQ **Q: Cassandra vs ScyllaDB?** A: API 完全兼容。ScyllaDB C++ 实现,shard-per-core 架构,性能数倍更好,但商业版是专有。Cassandra 是 Apache 基金会项目,生态更成熟。 **Q: 适合什么场景?** A: 大规模时序数据、事件日志、IoT、消息历史、推荐系统。不适合需要复杂 JOIN 或强事务的场景(没有多表 JOIN)。 **Q: 数据建模原则?** A: Query-driven。先确定查询模式,再设计表结构。反范式化、数据冗余是常态——每种查询一张表。 ## 来源与致谢 Sources - Docs: https://cassandra.apache.org/doc - GitHub: https://github.com/apache/cassandra - License: Apache 2.0 --- Source: https://tokrepo.com/en/workflows/0114283a-35f7-11f1-9bc6-00163e2b0d79 Author: Script Depot