Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsApr 11, 2026·2 min de lecture

Apache Kafka — Distributed Event Streaming Platform

Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber.

Introduction

Apache Kafka is a distributed event streaming platform originally created at LinkedIn (by Jay Kreps, Neha Narkhede, and Jun Rao) and open-sourced in 2011. Now donated to the Apache Software Foundation. Kafka powers data pipelines at thousands of companies, handling trillions of messages per day.

What Kafka Does

  • Publish and subscribe — producers write, consumers read
  • Topics and partitions — horizontally scalable logs
  • Persistence — durable disk storage with configurable retention
  • Replication — per-partition replicas across brokers
  • Consumer groups — parallel consumption with auto rebalance
  • Streams API — stateful stream processing
  • Connect — pre-built integrations (JDBC, S3, Elastic, etc.)
  • Exactly-once — transactional semantics
  • KRaft — Raft-based metadata (replaces ZooKeeper)

Architecture

Brokers form a cluster, each holding partition replicas. Producers write to partitions (by key-based hashing). Consumers pull from partitions, tracking offsets. KRaft nodes (v3.5+) handle cluster metadata instead of ZooKeeper.

Self-Hosting

# Docker Compose (single broker)
version: "3"
services:
  kafka:
    image: bitnami/kafka:3.7
    ports:
      - "9092:9092"
    environment:
      KAFKA_CFG_NODE_ID: 1
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER

Key Features

  • Distributed commit log
  • Horizontal scale via partitions
  • Replication for durability
  • Consumer groups for parallelism
  • Exactly-once transactional semantics
  • Kafka Connect ecosystem
  • Kafka Streams for stateful processing
  • KRaft mode (no ZooKeeper)
  • MirrorMaker for cross-cluster replication
  • Schema Registry (Confluent)

Comparison

System Model Durability Ecosystem
Kafka Distributed log Disk + replicas Largest
Redpanda Kafka-compatible (C++) Disk + replicas Kafka-compatible
Pulsar Segmented storage BookKeeper Growing
NATS JetStream Streaming Disk Simpler
RabbitMQ Traditional MQ Persistent queues Mature

FAQ

Q: What's the difference vs RabbitMQ? A: Kafka is a distributed log (persistent storage, time-based retention, high throughput); RabbitMQ is a traditional message queue (FIFO, ack, routing). Choose Kafka for streaming data and event sourcing; choose RabbitMQ for async task queues.

Q: Do I still need ZooKeeper? A: KRaft mode in v3.5+ is GA. New clusters no longer need ZooKeeper, simplifying deployment.

Q: How is the performance? A: A single broker easily handles hundreds of thousands of msg/s. LinkedIn's peak cluster load reaches 7 trillion messages per day.

Sources

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires