Scripts2026年4月13日·1 分钟阅读

Apache Pulsar — Cloud-Native Distributed Messaging and Streaming

Apache Pulsar is a cloud-native distributed messaging and streaming platform. It combines the best of traditional messaging (like RabbitMQ) with streaming (like Kafka) — providing multi-tenancy, geo-replication, and tiered storage in a single system.

SC
Script Depot · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

# Run with Docker
docker run -d --name pulsar \
  -p 6650:6650 -p 8080:8080 \
  apachepulsar/pulsar:latest bin/pulsar standalone

# Produce a message
docker exec -it pulsar bin/pulsar-client produce my-topic --messages "Hello Pulsar"

# Consume messages
docker exec -it pulsar bin/pulsar-client consume my-topic -s "my-sub" -n 0

# Python client
pip install pulsar-client

Introduction

Apache Pulsar was originally developed at Yahoo to handle their massive messaging needs across multiple data centers. It separates the serving layer (brokers) from the storage layer (Apache BookKeeper), enabling independent scaling of compute and storage. This architecture provides features that Kafka cannot match: native multi-tenancy, geo-replication, and tiered storage.

With over 15,000 GitHub stars, Pulsar is used by Tencent, Verizon Media, Overstock, and organizations that need multi-tenant messaging, cross-datacenter replication, or unified messaging and streaming.

What Pulsar Does

Pulsar provides pub/sub messaging with guaranteed delivery, message queuing with multiple subscription modes (exclusive, shared, failover, key-shared), and streaming with message replay. Its multi-layered architecture separates stateless brokers from stateful storage, enabling elastic scaling.

Architecture Overview

[Producers] --> [Pulsar Brokers]
                (stateless, scalable)
                      |
              [Topic Lookup & Routing]
                      |
              [Apache BookKeeper]
              (storage layer, durable)
              Write-Ahead Log entries
                      |
              [Tiered Storage]
              Offload old data to
              S3, GCS, Azure Blob
                      |
[Consumers] <-- [Subscription Modes]
                Exclusive, Shared,
                Failover, Key-Shared

[Multi-Tenancy]
Tenants -> Namespaces -> Topics
Isolation, quotas, policies

Self-Hosting & Configuration

# Python producer and consumer
import pulsar

# Producer
client = pulsar.Client("pulsar://localhost:6650")
producer = client.create_producer("persistent://public/default/my-topic")
for i in range(10):
    producer.send(f"Message {i}".encode())
print("Messages sent")
client.close()

# Consumer with shared subscription
client = pulsar.Client("pulsar://localhost:6650")
consumer = client.subscribe(
    "persistent://public/default/my-topic",
    subscription_name="my-sub",
    consumer_type=pulsar.ConsumerType.Shared
)
while True:
    msg = consumer.receive()
    print(f"Received: {msg.data().decode()}")
    consumer.acknowledge(msg)

Key Features

  • Multi-Tenancy — native tenant, namespace, and topic isolation
  • Geo-Replication — built-in cross-datacenter replication
  • Tiered Storage — automatically offload old messages to S3/GCS
  • Multiple Subscriptions — exclusive, shared, failover, and key-shared
  • Schema Registry — built-in schema evolution and enforcement
  • Pulsar Functions — lightweight serverless processing on streams
  • Transactions — exactly-once semantics across multiple topics
  • Stateless Brokers — independent scaling of compute and storage

Comparison with Similar Tools

Feature Pulsar Kafka RabbitMQ NATS Redpanda
Architecture Broker + BookKeeper Broker + Storage Single process Single binary Single binary
Multi-Tenancy Native Manual Vhosts Accounts Limited
Geo-Replication Built-in MirrorMaker Federation JetStream No
Tiered Storage Built-in Plugin No No Yes
Queue + Stream Both native Stream-first Queue-first Both Stream-first
Scaling Independent compute/storage Coupled Vertical Horizontal Coupled
Best For Multi-tenant, geo-distributed High-throughput streaming Task queues Lightweight messaging Kafka replacement

FAQ

Q: Pulsar vs Kafka — when should I choose Pulsar? A: Pulsar for multi-tenancy, geo-replication, tiered storage, and when you need both queuing and streaming. Kafka for maximum throughput, the largest ecosystem, and when Confluent/managed Kafka fits your needs.

Q: Is Pulsar harder to operate than Kafka? A: Pulsar has more components (brokers + BookKeeper + ZooKeeper), which adds operational complexity. However, the separation of compute and storage makes scaling easier. Managed services (StreamNative) reduce operational burden.

Q: What are Pulsar Functions? A: Lightweight serverless functions that process messages on Pulsar topics — like AWS Lambda but integrated into the messaging system. Write in Java, Python, or Go.

Q: Can I migrate from Kafka to Pulsar? A: Pulsar provides a Kafka-compatible protocol handler (KoP) that lets Kafka clients connect to Pulsar without code changes. This enables gradual migration.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产