How do I install Apache Cassandra — Distributed Wide-Column Database at Scale?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Apache Cassandra — Distributed Wide-Column Database at Scale

CREATE KEYSPACE tokrepo WITH REPLICATION = { "class": "SimpleStrategy", "replication_factor": 1 }; USE tokrepo; CREATE TABLE events_by_user ( user_id uuid, event_time timestamp, event_type text, payload text, PRIMARY KEY (user_id, event_time) ) WITH CLUSTERING ORDER BY (event_time DESC); INSERT INTO events_by_user (user_id, event_time, event_type, payload) VALUES (uuid(), toTimestamp(now()), "page_view", "/home"); -- Time-ordered query SELECT * FROM events_by_user WHERE user_id = 9b3a... AND event_time > "2026-01-01"; -- TTL (auto-delete after 7 days) INSERT INTO events_by_user (user_id, event_time, event_type) VALUES (uuid(), toTimestamp(now()), "login") USING TTL 604800;

What Cassandra Does

Wide-column model — partition key + clustering keys + columns
CQL query language — SQL-like declarative syntax
Masterless peer-to-peer — all nodes equal (no single point of failure)
Tunable consistency — per-query consistency level (ONE, QUORUM, ALL)
Multi-DC replication — NetworkTopologyStrategy
Lightweight transactions (LWT) — Paxos-based compare-and-set
Materialized views — denormalized auto-maintained tables
TTL — per-cell time-to-live
Gossip protocol — peer state distribution
Compaction strategies — STCS, LCS, TWCS for different workloads

Architecture

Peer-to-peer ring: each node owns a range of partition keys determined by consistent hashing. Data is replicated to N nodes. Writes go to a memtable + commit log, flushed to SSTables. Reads merge SSTables + memtable + possibly bloom filters and row cache.

Self-Hosting

# 3-node cluster
version: "3"
services:
  cassandra-node1:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1
  cassandra-node2:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1
  cassandra-node3:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1

Key Features

Linear horizontal scaling
Masterless architecture
Tunable consistency
Multi-DC replication
CQL query language
Secondary indexes
Materialized views
Lightweight transactions
TTL for auto-expiration
Battle-tested at petabyte scale

Comparison

Database	Model	Consistency	Scale
Cassandra	Wide column	Tunable (AP)	Linear (masterless)
ScyllaDB	Wide column (CQL compatible)	Tunable	Linear (shard-per-core)
HBase	Wide column	Strong (CP)	Region servers
DynamoDB	Key-value + doc	Tunable	Managed
Bigtable	Wide column	Strong	Managed
MongoDB	Document	Tunable	Sharding

常见问题 FAQ

Q: Cassandra vs ScyllaDB？ A: API 完全兼容。ScyllaDB C++ 实现，shard-per-core 架构，性能数倍更好，但商业版是专有。Cassandra 是 Apache 基金会项目，生态更成熟。

Q: 适合什么场景？ A: 大规模时序数据、事件日志、IoT、消息历史、推荐系统。不适合需要复杂 JOIN 或强事务的场景（没有多表 JOIN）。

Q: 数据建模原则？ A: Query-driven。先确定查询模式，再设计表结构。反范式化、数据冗余是常态——每种查询一张表。

来源与致谢 Sources

Docs: https://cassandra.apache.org/doc
GitHub: https://github.com/apache/cassandra
License: Apache 2.0

Apache Cassandra — Distributed Wide-Column Database at Scale

先拿来用，再决定要不要深挖

What Cassandra Does

Architecture

Self-Hosting

Key Features

Comparison

常见问题 FAQ

来源与致谢 Sources

讨论

相关资产

Elixir — Dynamic Functional Language for Scalable Apps

Kotlin — Modern Statically Typed Language for JVM and Beyond

Flutter — Google Cross-Platform UI Toolkit for Beautiful Apps