Apache Kafka — Distributed Event Streaming Platform
Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber.
What it is
Apache Kafka is an open-source distributed event streaming platform originally created at LinkedIn and donated to the Apache Software Foundation. It handles publish-subscribe messaging, durable storage, and stream processing in a single platform. Written in Java and Scala, Kafka organizes data into topics split across partitions for horizontal scalability.
Kafka is used by backend engineers building data pipelines, streaming analytics platforms, and event-driven microservice architectures. If your system needs to move large volumes of events reliably between producers and consumers, Kafka is the standard choice.
How it saves time or tokens
Kafka eliminates the need to build custom message queuing and replay infrastructure. Its partitioned log model means consumers can rewind and replay events without the producer doing extra work. The KRaft mode (available since v3.5) removes the ZooKeeper dependency, cutting operational overhead by eliminating a separate coordination cluster. A single Kafka broker handles millions of messages per second with sub-millisecond latency on commodity hardware.
How to use
- Download and extract the Kafka binary distribution, then format storage with a generated cluster ID using KRaft mode.
- Start the broker with
bin/kafka-server-start.sh config/kraft/server.properties-- no ZooKeeper required. - Create a topic, produce messages via the console producer, and read them back with the console consumer.
Example
# Download Kafka
curl -O https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar xzf kafka_2.13-3.7.0.tgz && cd kafka_2.13-3.7.0
# Format storage with KRaft
KAFKA_CLUSTER_ID=$(bin/kafka-storage.sh random-uuid)
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
# Start broker
bin/kafka-server-start.sh config/kraft/server.properties
# Create topic and produce
bin/kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092
bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092
Related on TokRepo
- Automation tools -- browse other event-driven and pipeline automation assets
- Self-hosted tools -- Kafka pairs well with self-hosted data infrastructure
Common pitfalls
- Running ZooKeeper when KRaft mode is available wastes resources and adds operational complexity.
- Setting partition count too low at topic creation time limits consumer parallelism and cannot be decreased later.
- Ignoring consumer group lag monitoring leads to silent data processing delays that surface only during incidents.
Frequently Asked Questions
KRaft is Kafka's built-in consensus protocol that replaces ZooKeeper for metadata management. Available since Kafka v3.5 as production-ready, KRaft runs the controller as part of the Kafka process itself, removing the need to deploy and maintain a separate ZooKeeper cluster.
Kafka stores messages in durable, append-only logs with configurable retention. Consumers pull messages at their own pace and can replay from any offset. Traditional queues delete messages after delivery. Kafka also supports multiple independent consumer groups reading the same topic.
Kafka has official clients for Java. Community-maintained clients exist for Python (confluent-kafka-python), Go (confluent-kafka-go, segmentio/kafka-go), Node.js (kafkajs), C/C++ (librdkafka), and .NET. The Confluent ecosystem provides tested clients for most languages.
Kafka exposes JMX metrics for broker health, topic throughput, and consumer lag. Tools like Kafka UI, AKHQ, or Confluent Control Center provide dashboards. The critical metric to watch is consumer group lag, which shows how far behind consumers are from the latest produced offset.
Kafka supports exactly-once semantics (EOS) through idempotent producers and transactional APIs introduced in v0.11. When configured with enable.idempotence=true and transactional.id, producers guarantee no duplicates. Kafka Streams also provides exactly-once processing guarantees.
Citations (3)
- Apache Kafka GitHub— Apache Kafka distributed event streaming platform
- Apache Kafka Documentation— KRaft mode removes ZooKeeper dependency
- Confluent Kafka Documentation— Exactly-once semantics via idempotent producers
Related on TokRepo
Discussion
Related Assets
Moodle — Open-Source Learning Management System
The most widely used open-source learning platform, providing course management, assessments, and collaboration tools for educators and organizations worldwide.
Sylius — Headless E-Commerce Framework on Symfony
An open-source headless e-commerce platform built on Symfony and API Platform, designed for developers who need a customizable and API-first commerce solution.
Akaunting — Free Self-Hosted Accounting Software
A free, open-source online accounting application built on Laravel for small businesses and freelancers to manage invoices, expenses, and financial reports.