Scripts2026年4月11日·1 分钟阅读

Apache Cassandra — Distributed Wide-Column Database at Scale

Apache Cassandra is an open-source, distributed, wide-column NoSQL database. Linear scalability and proven fault-tolerance on commodity hardware. Used at Netflix, Apple, Instagram, and eBay for petabyte-scale workloads with high availability.

SC
Script Depot · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

# Docker
docker run -d --name cassandra -p 9042:9042 cassandra:5

# CQL shell
docker exec -it cassandra cqlsh

CQL usage:

CREATE KEYSPACE tokrepo WITH REPLICATION = {
  "class": "SimpleStrategy",
  "replication_factor": 1
};

USE tokrepo;

CREATE TABLE events_by_user (
  user_id uuid,
  event_time timestamp,
  event_type text,
  payload text,
  PRIMARY KEY (user_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);

INSERT INTO events_by_user (user_id, event_time, event_type, payload)
VALUES (uuid(), toTimestamp(now()), "page_view", "/home");

-- Time-ordered query
SELECT * FROM events_by_user
WHERE user_id = 9b3a...
AND event_time > "2026-01-01";

-- TTL (auto-delete after 7 days)
INSERT INTO events_by_user (user_id, event_time, event_type)
VALUES (uuid(), toTimestamp(now()), "login")
USING TTL 604800;
介绍

Apache Cassandra is a free and open-source, distributed, wide-column NoSQL database. Originally developed at Facebook in 2008 to handle inbox search, open-sourced in 2009, and graduated to Apache top-level in 2010. Built to provide scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure.

What Cassandra Does

  • Wide-column model — partition key + clustering keys + columns
  • CQL query language — SQL-like declarative syntax
  • Masterless peer-to-peer — all nodes equal (no single point of failure)
  • Tunable consistency — per-query consistency level (ONE, QUORUM, ALL)
  • Multi-DC replication — NetworkTopologyStrategy
  • Lightweight transactions (LWT) — Paxos-based compare-and-set
  • Materialized views — denormalized auto-maintained tables
  • TTL — per-cell time-to-live
  • Gossip protocol — peer state distribution
  • Compaction strategies — STCS, LCS, TWCS for different workloads

Architecture

Peer-to-peer ring: each node owns a range of partition keys determined by consistent hashing. Data is replicated to N nodes. Writes go to a memtable + commit log, flushed to SSTables. Reads merge SSTables + memtable + possibly bloom filters and row cache.

Self-Hosting

# 3-node cluster
version: "3"
services:
  cassandra-node1:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1
  cassandra-node2:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1
  cassandra-node3:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1

Key Features

  • Linear horizontal scaling
  • Masterless architecture
  • Tunable consistency
  • Multi-DC replication
  • CQL query language
  • Secondary indexes
  • Materialized views
  • Lightweight transactions
  • TTL for auto-expiration
  • Battle-tested at petabyte scale

Comparison

Database Model Consistency Scale
Cassandra Wide column Tunable (AP) Linear (masterless)
ScyllaDB Wide column (CQL compatible) Tunable Linear (shard-per-core)
HBase Wide column Strong (CP) Region servers
DynamoDB Key-value + doc Tunable Managed
Bigtable Wide column Strong Managed
MongoDB Document Tunable Sharding

常见问题 FAQ

Q: Cassandra vs ScyllaDB? A: API 完全兼容。ScyllaDB C++ 实现,shard-per-core 架构,性能数倍更好,但商业版是专有。Cassandra 是 Apache 基金会项目,生态更成熟。

Q: 适合什么场景? A: 大规模时序数据、事件日志、IoT、消息历史、推荐系统。不适合需要复杂 JOIN 或强事务的场景(没有多表 JOIN)。

Q: 数据建模原则? A: Query-driven。先确定查询模式,再设计表结构。反范式化、数据冗余是常态——每种查询一张表。

来源与致谢 Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产