What Kafka Does
- Publish and subscribe — producers write, consumers read
- Topics and partitions — horizontally scalable logs
- Persistence — durable disk storage with configurable retention
- Replication — per-partition replicas across brokers
- Consumer groups — parallel consumption with auto rebalance
- Streams API — stateful stream processing
- Connect — pre-built integrations (JDBC, S3, Elastic, etc.)
- Exactly-once — transactional semantics
- KRaft — Raft-based metadata (replaces ZooKeeper)
Architecture
Brokers form a cluster, each holding partition replicas. Producers write to partitions (by key-based hashing). Consumers pull from partitions, tracking offsets. KRaft nodes (v3.5+) handle cluster metadata instead of ZooKeeper.
Self-Hosting
# Docker Compose (single broker)
version: "3"
services:
kafka:
image: bitnami/kafka:3.7
ports:
- "9092:9092"
environment:
KAFKA_CFG_NODE_ID: 1
KAFKA_CFG_PROCESS_ROLES: controller,broker
KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLERKey Features
- Distributed commit log
- Horizontal scale via partitions
- Replication for durability
- Consumer groups for parallelism
- Exactly-once transactional semantics
- Kafka Connect ecosystem
- Kafka Streams for stateful processing
- KRaft mode (no ZooKeeper)
- MirrorMaker for cross-cluster replication
- Schema Registry (Confluent)
Comparison
| System | Model | Durability | Ecosystem |
|---|---|---|---|
| Kafka | Distributed log | Disk + replicas | Largest |
| Redpanda | Kafka-compatible (C++) | Disk + replicas | Kafka-compatible |
| Pulsar | Segmented storage | BookKeeper | Growing |
| NATS JetStream | Streaming | Disk | Simpler |
| RabbitMQ | Traditional MQ | Persistent queues | Mature |
FAQ
Q: What's the difference vs RabbitMQ? A: Kafka is a distributed log (persistent storage, time-based retention, high throughput); RabbitMQ is a traditional message queue (FIFO, ack, routing). Choose Kafka for streaming data and event sourcing; choose RabbitMQ for async task queues.
Q: Do I still need ZooKeeper? A: KRaft mode in v3.5+ is GA. New clusters no longer need ZooKeeper, simplifying deployment.
Q: How is the performance? A: A single broker easily handles hundreds of thousands of msg/s. LinkedIn's peak cluster load reaches 7 trillion messages per day.
Sources
- Docs: https://kafka.apache.org/documentation
- GitHub: https://github.com/apache/kafka
- License: Apache 2.0