Apache Pulsar — Cloud-Native Distributed Messaging and Streaming
Apache Pulsar is a cloud-native distributed messaging and streaming platform. It combines the best of traditional messaging (like RabbitMQ) with streaming (like Kafka) — providing multi-tenancy, geo-replication, and tiered storage in a single system.
Instalación lista para agent
Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.
npx -y tokrepo@latest install 8d354adf-3734-11f1-9bc6-00163e2b0d79 --target codexEjecutar después de confirmar el plan con dry-run.
What it is
Apache Pulsar is a cloud-native distributed messaging and streaming platform that combines the capabilities of traditional message queues (like RabbitMQ) with event streaming (like Kafka) in a single system. It provides multi-tenancy, geo-replication, and tiered storage as built-in features rather than add-ons.
Pulsar is designed for platform teams and backend engineers who need a unified messaging layer that scales from simple pub-sub to complex event streaming without deploying separate systems for each use case.
How it saves time or tokens
Pulsar's architecture separates compute (brokers) from storage (BookKeeper), which means you can scale throughput and storage independently. This eliminates the rebalancing pain common with broker-storage-coupled systems. Multi-tenancy is built in, so a single Pulsar cluster can serve multiple teams with namespace-level isolation, reducing operational overhead.
The unified messaging model means you do not need to maintain separate Kafka clusters for streaming and RabbitMQ for queuing. One Pulsar cluster handles both patterns with topic-level configuration.
How to use
- Start Pulsar with Docker:
docker run -d --name pulsar -p 6650:6650 -p 8080:8080 apachepulsar/pulsar:latest bin/pulsar standalone. - Produce a message:
bin/pulsar-client produce my-topic --messages 'hello pulsar'. - Consume messages:
bin/pulsar-client consume my-topic -s my-sub --num-messages 0.
Example
# Start Pulsar standalone in Docker
docker run -d --name pulsar \
-p 6650:6650 -p 8080:8080 \
apachepulsar/pulsar:latest bin/pulsar standalone
# Produce messages
bin/pulsar-client produce my-topic --messages 'hello pulsar'
# Consume messages
bin/pulsar-client consume my-topic -s my-subscription --num-messages 0
# Python client
pip install pulsar-client
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('my-topic')
producer.send('hello from python'.encode())
client.close()
Related on TokRepo
- DevOps tools -- infrastructure and operations tooling
- Self-hosted solutions -- open-source self-hosted platforms
Common pitfalls
- Pulsar standalone mode is for development only; production deployments require a ZooKeeper cluster and BookKeeper ensemble, which adds operational complexity.
- The broker-storage separation is powerful but means more moving parts to monitor; invest in observability (Prometheus metrics are built in) from day one.
- Client library support varies by language; Java and Python clients are most mature, while Go and Node.js clients may lag in feature parity.
Preguntas frecuentes
Pulsar separates compute (brokers) from storage (BookKeeper), enabling independent scaling. Kafka couples brokers and storage, requiring partition rebalancing when scaling. Pulsar also provides built-in multi-tenancy and geo-replication that Kafka requires additional tooling for.
Pulsar supports tenant and namespace isolation at the cluster level. Different teams or applications can share a single Pulsar cluster with independent topic namespaces, access controls, and resource quotas.
Yes. Pulsar supports exactly-once message delivery through transactional messaging. Producers can send messages within transactions, and consumers can acknowledge messages atomically, ensuring no duplicates or losses.
Tiered storage automatically offloads older messages from BookKeeper to cheaper object storage (S3, GCS, Azure Blob). This lets you retain months or years of data without the cost of keeping it all on fast storage.
Yes. Pulsar Functions is a lightweight compute framework for processing messages in-flight. Functions can transform, route, or enrich messages without deploying a separate stream processing framework.
Referencias (3)
- Apache Pulsar GitHub— Apache Pulsar is a distributed messaging and streaming platform
- Apache Pulsar Documentation— Multi-tenancy and geo-replication built into the architecture
- Apache Pulsar— Apache Software Foundation top-level project
Relacionados en TokRepo
Discusión
Activos relacionados
Apache Kafka — Distributed Event Streaming Platform
Apache Kafka is the open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, and mission-critical applications. Trillions of messages per day at LinkedIn, Netflix, Uber.
Apache RocketMQ — Cloud-Native Messaging and Streaming Platform
A guide to Apache RocketMQ, the distributed messaging and streaming platform built for high throughput, low latency, and trillion-level message capacity.
Apache SeaTunnel — High-Performance Data Integration Engine
Fast, distributed, cloud-native data integration tool for batch and streaming data synchronization across 100+ sources and sinks.
JuiceFS — Cloud-Native POSIX File System Built on Object Storage
A high-performance distributed file system that stores data in object storage like S3 while keeping metadata in Redis, PostgreSQL, or MySQL for cloud-native workloads.