Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsApr 11, 2026·3 min de lecture

Apache Cassandra — Distributed Wide-Column Database at Scale

Apache Cassandra is an open-source, distributed, wide-column NoSQL database. Linear scalability and proven fault-tolerance on commodity hardware. Used at Netflix, Apple, Instagram, and eBay for petabyte-scale workloads with high availability.

Introduction

Apache Cassandra is a free and open-source, distributed, wide-column NoSQL database. Originally developed at Facebook in 2008 to handle inbox search, open-sourced in 2009, and graduated to Apache top-level in 2010. Built to provide scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure.

What Cassandra Does

  • Wide-column model — partition key + clustering keys + columns
  • CQL query language — SQL-like declarative syntax
  • Masterless peer-to-peer — all nodes equal (no single point of failure)
  • Tunable consistency — per-query consistency level (ONE, QUORUM, ALL)
  • Multi-DC replication — NetworkTopologyStrategy
  • Lightweight transactions (LWT) — Paxos-based compare-and-set
  • Materialized views — denormalized auto-maintained tables
  • TTL — per-cell time-to-live
  • Gossip protocol — peer state distribution
  • Compaction strategies — STCS, LCS, TWCS for different workloads

Architecture

Peer-to-peer ring: each node owns a range of partition keys determined by consistent hashing. Data is replicated to N nodes. Writes go to a memtable + commit log, flushed to SSTables. Reads merge SSTables + memtable + possibly bloom filters and row cache.

Self-Hosting

# 3-node cluster
version: "3"
services:
  cassandra-node1:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1
  cassandra-node2:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1
  cassandra-node3:
    image: cassandra:5
    environment:
      CASSANDRA_CLUSTER_NAME: tokrepo
      CASSANDRA_SEEDS: cassandra-node1

Key Features

  • Linear horizontal scaling
  • Masterless architecture
  • Tunable consistency
  • Multi-DC replication
  • CQL query language
  • Secondary indexes
  • Materialized views
  • Lightweight transactions
  • TTL for auto-expiration
  • Battle-tested at petabyte scale

Comparison

Database Model Consistency Scale
Cassandra Wide column Tunable (AP) Linear (masterless)
ScyllaDB Wide column (CQL compatible) Tunable Linear (shard-per-core)
HBase Wide column Strong (CP) Region servers
DynamoDB Key-value + doc Tunable Managed
Bigtable Wide column Strong Managed
MongoDB Document Tunable Sharding

FAQ

Q: Cassandra vs ScyllaDB? A: Fully API-compatible. ScyllaDB is implemented in C++ with a shard-per-core architecture and performs several times better, but its commercial edition is proprietary. Cassandra is an Apache Foundation project with a more mature ecosystem.

Q: What scenarios is it good for? A: Large-scale time series data, event logs, IoT, message history, and recommendation systems. Not suitable for scenarios requiring complex JOINs or strong transactions (no multi-table JOINs).

Q: Data modeling principles? A: Query-driven. Decide on your query patterns first, then design your table schema. Denormalization and data duplication are the norm — one table per query pattern.

Sources

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires