ScriptsApr 11, 2026·1 min read

Apache Kafka — Distributed Event Streaming Platform

Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber.

SC
Script Depot · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

Start Kafka locally with KRaft (no ZooKeeper needed since v3.5):

# Download
curl -O https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar xzf kafka_2.13-3.7.0.tgz && cd kafka_2.13-3.7.0

# Generate cluster ID
KAFKA_CLUSTER_ID=$(bin/kafka-storage.sh random-uuid)
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

# Start broker
bin/kafka-server-start.sh config/kraft/server.properties

Produce and consume:

# Create topic
bin/kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092

# Producer
bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092
> { "id": 1, "amount": 49.99 }

# Consumer
bin/kafka-console-consumer.sh --topic orders --from-beginning --bootstrap-server localhost:9092
Intro

Apache Kafka is a distributed event streaming platform originally created at LinkedIn (by Jay Kreps, Neha Narkhede, and Jun Rao) and open-sourced in 2011. Now donated to the Apache Software Foundation. Kafka powers data pipelines at thousands of companies, handling trillions of messages per day.

What Kafka Does

  • Publish and subscribe — producers write, consumers read
  • Topics and partitions — horizontally scalable logs
  • Persistence — durable disk storage with configurable retention
  • Replication — per-partition replicas across brokers
  • Consumer groups — parallel consumption with auto rebalance
  • Streams API — stateful stream processing
  • Connect — pre-built integrations (JDBC, S3, Elastic, etc.)
  • Exactly-once — transactional semantics
  • KRaft — Raft-based metadata (replaces ZooKeeper)

Architecture

Brokers form a cluster, each holding partition replicas. Producers write to partitions (by key-based hashing). Consumers pull from partitions, tracking offsets. KRaft nodes (v3.5+) handle cluster metadata instead of ZooKeeper.

Self-Hosting

# Docker Compose (single broker)
version: "3"
services:
  kafka:
    image: bitnami/kafka:3.7
    ports:
      - "9092:9092"
    environment:
      KAFKA_CFG_NODE_ID: 1
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER

Key Features

  • Distributed commit log
  • Horizontal scale via partitions
  • Replication for durability
  • Consumer groups for parallelism
  • Exactly-once transactional semantics
  • Kafka Connect ecosystem
  • Kafka Streams for stateful processing
  • KRaft mode (no ZooKeeper)
  • MirrorMaker for cross-cluster replication
  • Schema Registry (Confluent)

Comparison

System Model Durability Ecosystem
Kafka Distributed log Disk + replicas Largest
Redpanda Kafka-compatible (C++) Disk + replicas Kafka-compatible
Pulsar Segmented storage BookKeeper Growing
NATS JetStream Streaming Disk Simpler
RabbitMQ Traditional MQ Persistent queues Mature

常见问题 FAQ

Q: 和 RabbitMQ 区别? A: Kafka 是分布式日志(持久存储、按时间保留、高吞吐);RabbitMQ 是传统消息队列(FIFO、ack、routing)。流式数据 + 事件溯源选 Kafka;异步任务队列选 RabbitMQ。

Q: 还需要 ZooKeeper 吗? A: v3.5+ 的 KRaft 模式已经 GA。新集群不再需要 ZooKeeper,部署简化。

Q: 性能如何? A: 单 broker 轻松几十万 msg/s。LinkedIn 单集群峰值达到 7 trillion 消息/天。

来源与致谢 Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets