# Apache Kafka — Distributed Event Streaming Platform > Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use Start Kafka locally with KRaft (no ZooKeeper needed since v3.5): ```bash # Download curl -O https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz tar xzf kafka_2.13-3.7.0.tgz && cd kafka_2.13-3.7.0 # Generate cluster ID KAFKA_CLUSTER_ID=$(bin/kafka-storage.sh random-uuid) bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties # Start broker bin/kafka-server-start.sh config/kraft/server.properties ``` Produce and consume: ```bash # Create topic bin/kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092 # Producer bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092 > { "id": 1, "amount": 49.99 } # Consumer bin/kafka-console-consumer.sh --topic orders --from-beginning --bootstrap-server localhost:9092 ``` ## Intro Apache Kafka is a distributed event streaming platform originally created at LinkedIn (by Jay Kreps, Neha Narkhede, and Jun Rao) and open-sourced in 2011. Now donated to the Apache Software Foundation. Kafka powers data pipelines at thousands of companies, handling trillions of messages per day. - **Repo**: https://github.com/apache/kafka - **Stars**: 32K+ - **Language**: Java + Scala - **License**: Apache 2.0 ## What Kafka Does - **Publish and subscribe** — producers write, consumers read - **Topics and partitions** — horizontally scalable logs - **Persistence** — durable disk storage with configurable retention - **Replication** — per-partition replicas across brokers - **Consumer groups** — parallel consumption with auto rebalance - **Streams API** — stateful stream processing - **Connect** — pre-built integrations (JDBC, S3, Elastic, etc.) - **Exactly-once** — transactional semantics - **KRaft** — Raft-based metadata (replaces ZooKeeper) ## Architecture Brokers form a cluster, each holding partition replicas. Producers write to partitions (by key-based hashing). Consumers pull from partitions, tracking offsets. KRaft nodes (v3.5+) handle cluster metadata instead of ZooKeeper. ## Self-Hosting ```bash # Docker Compose (single broker) version: "3" services: kafka: image: bitnami/kafka:3.7 ports: - "9092:9092" environment: KAFKA_CFG_NODE_ID: 1 KAFKA_CFG_PROCESS_ROLES: controller,broker KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093 KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER ``` ## Key Features - Distributed commit log - Horizontal scale via partitions - Replication for durability - Consumer groups for parallelism - Exactly-once transactional semantics - Kafka Connect ecosystem - Kafka Streams for stateful processing - KRaft mode (no ZooKeeper) - MirrorMaker for cross-cluster replication - Schema Registry (Confluent) ## Comparison | System | Model | Durability | Ecosystem | |---|---|---|---| | Kafka | Distributed log | Disk + replicas | Largest | | Redpanda | Kafka-compatible (C++) | Disk + replicas | Kafka-compatible | | Pulsar | Segmented storage | BookKeeper | Growing | | NATS JetStream | Streaming | Disk | Simpler | | RabbitMQ | Traditional MQ | Persistent queues | Mature | ## FAQ **Q: What's the difference vs RabbitMQ?** A: Kafka is a distributed log (persistent storage, time-based retention, high throughput); RabbitMQ is a traditional message queue (FIFO, ack, routing). Choose Kafka for streaming data and event sourcing; choose RabbitMQ for async task queues. **Q: Do I still need ZooKeeper?** A: KRaft mode in v3.5+ is GA. New clusters no longer need ZooKeeper, simplifying deployment. **Q: How is the performance?** A: A single broker easily handles hundreds of thousands of msg/s. LinkedIn's peak cluster load reaches 7 trillion messages per day. ## Sources - Docs: https://kafka.apache.org/documentation - GitHub: https://github.com/apache/kafka - License: Apache 2.0 --- Source: https://tokrepo.com/en/workflows/apache-kafka-distributed-event-streaming-platform-a2aa8afb Author: Apache Software Foundation