# Apache Kafka — Distributed Event Streaming Platform

> Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

Start Kafka locally with KRaft (no ZooKeeper needed since v3.5):
```bash
# Download
curl -O https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar xzf kafka_2.13-3.7.0.tgz && cd kafka_2.13-3.7.0

# Generate cluster ID
KAFKA_CLUSTER_ID=$(bin/kafka-storage.sh random-uuid)
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

# Start broker
bin/kafka-server-start.sh config/kraft/server.properties
```

Produce and consume:
```bash
# Create topic
bin/kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092

# Producer
bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092
> { "id": 1, "amount": 49.99 }

# Consumer
bin/kafka-console-consumer.sh --topic orders --from-beginning --bootstrap-server localhost:9092
```

## Intro

Apache Kafka is a distributed event streaming platform originally created at LinkedIn (by Jay Kreps, Neha Narkhede, and Jun Rao) and open-sourced in 2011. Now donated to the Apache Software Foundation. Kafka powers data pipelines at thousands of companies, handling trillions of messages per day.

- **Repo**: https://github.com/apache/kafka
- **Stars**: 32K+
- **Language**: Java + Scala
- **License**: Apache 2.0

## What Kafka Does

- **Publish and subscribe** — producers write, consumers read
- **Topics and partitions** — horizontally scalable logs
- **Persistence** — durable disk storage with configurable retention
- **Replication** — per-partition replicas across brokers
- **Consumer groups** — parallel consumption with auto rebalance
- **Streams API** — stateful stream processing
- **Connect** — pre-built integrations (JDBC, S3, Elastic, etc.)
- **Exactly-once** — transactional semantics
- **KRaft** — Raft-based metadata (replaces ZooKeeper)

## Architecture

Brokers form a cluster, each holding partition replicas. Producers write to partitions (by key-based hashing). Consumers pull from partitions, tracking offsets. KRaft nodes (v3.5+) handle cluster metadata instead of ZooKeeper.

## Self-Hosting

```bash
# Docker Compose (single broker)
version: "3"
services:
  kafka:
    image: bitnami/kafka:3.7
    ports:
      - "9092:9092"
    environment:
      KAFKA_CFG_NODE_ID: 1
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
```

## Key Features

- Distributed commit log
- Horizontal scale via partitions
- Replication for durability
- Consumer groups for parallelism
- Exactly-once transactional semantics
- Kafka Connect ecosystem
- Kafka Streams for stateful processing
- KRaft mode (no ZooKeeper)
- MirrorMaker for cross-cluster replication
- Schema Registry (Confluent)

## Comparison

| System | Model | Durability | Ecosystem |
|---|---|---|---|
| Kafka | Distributed log | Disk + replicas | Largest |
| Redpanda | Kafka-compatible (C++) | Disk + replicas | Kafka-compatible |
| Pulsar | Segmented storage | BookKeeper | Growing |
| NATS JetStream | Streaming | Disk | Simpler |
| RabbitMQ | Traditional MQ | Persistent queues | Mature |

## FAQ

**Q: What's the difference vs RabbitMQ?**
A: Kafka is a distributed log (persistent storage, time-based retention, high throughput); RabbitMQ is a traditional message queue (FIFO, ack, routing). Choose Kafka for streaming data and event sourcing; choose RabbitMQ for async task queues.

**Q: Do I still need ZooKeeper?**
A: KRaft mode in v3.5+ is GA. New clusters no longer need ZooKeeper, simplifying deployment.

**Q: How is the performance?**
A: A single broker easily handles hundreds of thousands of msg/s. LinkedIn's peak cluster load reaches 7 trillion messages per day.

## Sources

- Docs: https://kafka.apache.org/documentation
- GitHub: https://github.com/apache/kafka
- License: Apache 2.0

---
Source: https://tokrepo.com/en/workflows/apache-kafka-distributed-event-streaming-platform-a2aa8afb
Author: Apache Software Foundation