What Kafka Does
- Publish and subscribe — producers write, consumers read
- Topics and partitions — horizontally scalable logs
- Persistence — durable disk storage with configurable retention
- Replication — per-partition replicas across brokers
- Consumer groups — parallel consumption with auto rebalance
- Streams API — stateful stream processing
- Connect — pre-built integrations (JDBC, S3, Elastic, etc.)
- Exactly-once — transactional semantics
- KRaft — Raft-based metadata (replaces ZooKeeper)
Architecture
Brokers form a cluster, each holding partition replicas. Producers write to partitions (by key-based hashing). Consumers pull from partitions, tracking offsets. KRaft nodes (v3.5+) handle cluster metadata instead of ZooKeeper.
Self-Hosting
# Docker Compose (single broker)
version: "3"
services:
kafka:
image: bitnami/kafka:3.7
ports:
- "9092:9092"
environment:
KAFKA_CFG_NODE_ID: 1
KAFKA_CFG_PROCESS_ROLES: controller,broker
KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLERKey Features
- Distributed commit log
- Horizontal scale via partitions
- Replication for durability
- Consumer groups for parallelism
- Exactly-once transactional semantics
- Kafka Connect ecosystem
- Kafka Streams for stateful processing
- KRaft mode (no ZooKeeper)
- MirrorMaker for cross-cluster replication
- Schema Registry (Confluent)
Comparison
| System | Model | Durability | Ecosystem |
|---|---|---|---|
| Kafka | Distributed log | Disk + replicas | Largest |
| Redpanda | Kafka-compatible (C++) | Disk + replicas | Kafka-compatible |
| Pulsar | Segmented storage | BookKeeper | Growing |
| NATS JetStream | Streaming | Disk | Simpler |
| RabbitMQ | Traditional MQ | Persistent queues | Mature |
常见问题 FAQ
Q: 和 RabbitMQ 区别? A: Kafka 是分布式日志(持久存储、按时间保留、高吞吐);RabbitMQ 是传统消息队列(FIFO、ack、routing)。流式数据 + 事件溯源选 Kafka;异步任务队列选 RabbitMQ。
Q: 还需要 ZooKeeper 吗? A: v3.5+ 的 KRaft 模式已经 GA。新集群不再需要 ZooKeeper,部署简化。
Q: 性能如何? A: 单 broker 轻松几十万 msg/s。LinkedIn 单集群峰值达到 7 trillion 消息/天。
来源与致谢 Sources
- Docs: https://kafka.apache.org/documentation
- GitHub: https://github.com/apache/kafka
- License: Apache 2.0