ScriptsApr 11, 2026·2 min read

Apache Kafka — Distributed Event Streaming Platform

Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber.

TL;DR
Apache Kafka handles publish-subscribe messaging at massive scale with durable partitioned logs and no ZooKeeper since v3.5.
§01

What it is

Apache Kafka is an open-source distributed event streaming platform originally created at LinkedIn and donated to the Apache Software Foundation. It handles publish-subscribe messaging, durable storage, and stream processing in a single platform. Written in Java and Scala, Kafka organizes data into topics split across partitions for horizontal scalability.

Kafka is used by backend engineers building data pipelines, streaming analytics platforms, and event-driven microservice architectures. If your system needs to move large volumes of events reliably between producers and consumers, Kafka is the standard choice.

§02

How it saves time or tokens

Kafka eliminates the need to build custom message queuing and replay infrastructure. Its partitioned log model means consumers can rewind and replay events without the producer doing extra work. The KRaft mode (available since v3.5) removes the ZooKeeper dependency, cutting operational overhead by eliminating a separate coordination cluster. A single Kafka broker handles millions of messages per second with sub-millisecond latency on commodity hardware.

§03

How to use

  1. Download and extract the Kafka binary distribution, then format storage with a generated cluster ID using KRaft mode.
  2. Start the broker with bin/kafka-server-start.sh config/kraft/server.properties -- no ZooKeeper required.
  3. Create a topic, produce messages via the console producer, and read them back with the console consumer.
§04

Example

# Download Kafka
curl -O https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar xzf kafka_2.13-3.7.0.tgz && cd kafka_2.13-3.7.0

# Format storage with KRaft
KAFKA_CLUSTER_ID=$(bin/kafka-storage.sh random-uuid)
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

# Start broker
bin/kafka-server-start.sh config/kraft/server.properties

# Create topic and produce
bin/kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092
bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092
§05

Related on TokRepo

§06

Common pitfalls

  • Running ZooKeeper when KRaft mode is available wastes resources and adds operational complexity.
  • Setting partition count too low at topic creation time limits consumer parallelism and cannot be decreased later.
  • Ignoring consumer group lag monitoring leads to silent data processing delays that surface only during incidents.

Frequently Asked Questions

What is KRaft mode in Kafka?+

KRaft is Kafka's built-in consensus protocol that replaces ZooKeeper for metadata management. Available since Kafka v3.5 as production-ready, KRaft runs the controller as part of the Kafka process itself, removing the need to deploy and maintain a separate ZooKeeper cluster.

How does Kafka differ from traditional message queues?+

Kafka stores messages in durable, append-only logs with configurable retention. Consumers pull messages at their own pace and can replay from any offset. Traditional queues delete messages after delivery. Kafka also supports multiple independent consumer groups reading the same topic.

What programming languages have Kafka client libraries?+

Kafka has official clients for Java. Community-maintained clients exist for Python (confluent-kafka-python), Go (confluent-kafka-go, segmentio/kafka-go), Node.js (kafkajs), C/C++ (librdkafka), and .NET. The Confluent ecosystem provides tested clients for most languages.

How do I monitor Kafka in production?+

Kafka exposes JMX metrics for broker health, topic throughput, and consumer lag. Tools like Kafka UI, AKHQ, or Confluent Control Center provide dashboards. The critical metric to watch is consumer group lag, which shows how far behind consumers are from the latest produced offset.

Can Kafka handle exactly-once message delivery?+

Kafka supports exactly-once semantics (EOS) through idempotent producers and transactional APIs introduced in v0.11. When configured with enable.idempotence=true and transactional.id, producers guarantee no duplicates. Kafka Streams also provides exactly-once processing guarantees.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets