Introduction
Apache Pulsar was originally developed at Yahoo to handle their massive messaging needs across multiple data centers. It separates the serving layer (brokers) from the storage layer (Apache BookKeeper), enabling independent scaling of compute and storage. This architecture provides features that Kafka cannot match: native multi-tenancy, geo-replication, and tiered storage.
With over 15,000 GitHub stars, Pulsar is used by Tencent, Verizon Media, Overstock, and organizations that need multi-tenant messaging, cross-datacenter replication, or unified messaging and streaming.
What Pulsar Does
Pulsar provides pub/sub messaging with guaranteed delivery, message queuing with multiple subscription modes (exclusive, shared, failover, key-shared), and streaming with message replay. Its multi-layered architecture separates stateless brokers from stateful storage, enabling elastic scaling.
Architecture Overview
[Producers] --> [Pulsar Brokers]
(stateless, scalable)
|
[Topic Lookup & Routing]
|
[Apache BookKeeper]
(storage layer, durable)
Write-Ahead Log entries
|
[Tiered Storage]
Offload old data to
S3, GCS, Azure Blob
|
[Consumers] <-- [Subscription Modes]
Exclusive, Shared,
Failover, Key-Shared
[Multi-Tenancy]
Tenants -> Namespaces -> Topics
Isolation, quotas, policiesSelf-Hosting & Configuration
# Python producer and consumer
import pulsar
# Producer
client = pulsar.Client("pulsar://localhost:6650")
producer = client.create_producer("persistent://public/default/my-topic")
for i in range(10):
producer.send(f"Message {i}".encode())
print("Messages sent")
client.close()
# Consumer with shared subscription
client = pulsar.Client("pulsar://localhost:6650")
consumer = client.subscribe(
"persistent://public/default/my-topic",
subscription_name="my-sub",
consumer_type=pulsar.ConsumerType.Shared
)
while True:
msg = consumer.receive()
print(f"Received: {msg.data().decode()}")
consumer.acknowledge(msg)Key Features
- Multi-Tenancy — native tenant, namespace, and topic isolation
- Geo-Replication — built-in cross-datacenter replication
- Tiered Storage — automatically offload old messages to S3/GCS
- Multiple Subscriptions — exclusive, shared, failover, and key-shared
- Schema Registry — built-in schema evolution and enforcement
- Pulsar Functions — lightweight serverless processing on streams
- Transactions — exactly-once semantics across multiple topics
- Stateless Brokers — independent scaling of compute and storage
Comparison with Similar Tools
| Feature | Pulsar | Kafka | RabbitMQ | NATS | Redpanda |
|---|---|---|---|---|---|
| Architecture | Broker + BookKeeper | Broker + Storage | Single process | Single binary | Single binary |
| Multi-Tenancy | Native | Manual | Vhosts | Accounts | Limited |
| Geo-Replication | Built-in | MirrorMaker | Federation | JetStream | No |
| Tiered Storage | Built-in | Plugin | No | No | Yes |
| Queue + Stream | Both native | Stream-first | Queue-first | Both | Stream-first |
| Scaling | Independent compute/storage | Coupled | Vertical | Horizontal | Coupled |
| Best For | Multi-tenant, geo-distributed | High-throughput streaming | Task queues | Lightweight messaging | Kafka replacement |
FAQ
Q: Pulsar vs Kafka — when should I choose Pulsar? A: Pulsar for multi-tenancy, geo-replication, tiered storage, and when you need both queuing and streaming. Kafka for maximum throughput, the largest ecosystem, and when Confluent/managed Kafka fits your needs.
Q: Is Pulsar harder to operate than Kafka? A: Pulsar has more components (brokers + BookKeeper + ZooKeeper), which adds operational complexity. However, the separation of compute and storage makes scaling easier. Managed services (StreamNative) reduce operational burden.
Q: What are Pulsar Functions? A: Lightweight serverless functions that process messages on Pulsar topics — like AWS Lambda but integrated into the messaging system. Write in Java, Python, or Go.
Q: Can I migrate from Kafka to Pulsar? A: Pulsar provides a Kafka-compatible protocol handler (KoP) that lets Kafka clients connect to Pulsar without code changes. This enables gradual migration.
Sources
- GitHub: https://github.com/apache/pulsar
- Documentation: https://pulsar.apache.org/docs
- Originally created at Yahoo, Apache Top-Level Project
- License: Apache-2.0