Scripts2026年4月11日·1 分钟阅读

Apache Kafka — Distributed Event Streaming Platform

Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber.

SC
Script Depot · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

Start Kafka locally with KRaft (no ZooKeeper needed since v3.5):

# Download
curl -O https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar xzf kafka_2.13-3.7.0.tgz && cd kafka_2.13-3.7.0

# Generate cluster ID
KAFKA_CLUSTER_ID=$(bin/kafka-storage.sh random-uuid)
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

# Start broker
bin/kafka-server-start.sh config/kraft/server.properties

Produce and consume:

# Create topic
bin/kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092

# Producer
bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092
> { "id": 1, "amount": 49.99 }

# Consumer
bin/kafka-console-consumer.sh --topic orders --from-beginning --bootstrap-server localhost:9092
介绍

Apache Kafka is a distributed event streaming platform originally created at LinkedIn (by Jay Kreps, Neha Narkhede, and Jun Rao) and open-sourced in 2011. Now donated to the Apache Software Foundation. Kafka powers data pipelines at thousands of companies, handling trillions of messages per day.

What Kafka Does

  • Publish and subscribe — producers write, consumers read
  • Topics and partitions — horizontally scalable logs
  • Persistence — durable disk storage with configurable retention
  • Replication — per-partition replicas across brokers
  • Consumer groups — parallel consumption with auto rebalance
  • Streams API — stateful stream processing
  • Connect — pre-built integrations (JDBC, S3, Elastic, etc.)
  • Exactly-once — transactional semantics
  • KRaft — Raft-based metadata (replaces ZooKeeper)

Architecture

Brokers form a cluster, each holding partition replicas. Producers write to partitions (by key-based hashing). Consumers pull from partitions, tracking offsets. KRaft nodes (v3.5+) handle cluster metadata instead of ZooKeeper.

Self-Hosting

# Docker Compose (single broker)
version: "3"
services:
  kafka:
    image: bitnami/kafka:3.7
    ports:
      - "9092:9092"
    environment:
      KAFKA_CFG_NODE_ID: 1
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER

Key Features

  • Distributed commit log
  • Horizontal scale via partitions
  • Replication for durability
  • Consumer groups for parallelism
  • Exactly-once transactional semantics
  • Kafka Connect ecosystem
  • Kafka Streams for stateful processing
  • KRaft mode (no ZooKeeper)
  • MirrorMaker for cross-cluster replication
  • Schema Registry (Confluent)

Comparison

System Model Durability Ecosystem
Kafka Distributed log Disk + replicas Largest
Redpanda Kafka-compatible (C++) Disk + replicas Kafka-compatible
Pulsar Segmented storage BookKeeper Growing
NATS JetStream Streaming Disk Simpler
RabbitMQ Traditional MQ Persistent queues Mature

常见问题 FAQ

Q: 和 RabbitMQ 区别? A: Kafka 是分布式日志(持久存储、按时间保留、高吞吐);RabbitMQ 是传统消息队列(FIFO、ack、routing)。流式数据 + 事件溯源选 Kafka;异步任务队列选 RabbitMQ。

Q: 还需要 ZooKeeper 吗? A: v3.5+ 的 KRaft 模式已经 GA。新集群不再需要 ZooKeeper,部署简化。

Q: 性能如何? A: 单 broker 轻松几十万 msg/s。LinkedIn 单集群峰值达到 7 trillion 消息/天。

来源与致谢 Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产