Layered Architecture
Pulsar’s architecture consists of multiple layers that work together to provide a complete messaging system:Broker Layer
Stateless serving layer that handles message routing and client connections
Storage Layer
Persistent storage using Apache BookKeeper for durable message storage
Metadata Layer
Coordination and metadata storage using ZooKeeper or other metadata stores
Client Layer
Producer and consumer clients that interact with brokers
Core Components
Brokers
Brokers are stateless service nodes that handle message routing and client connections. From the source code atpulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java, brokers provide:
- Client connection handling (binary protocol on port 6650 by default)
- HTTP/REST API (port 8080 by default)
- Topic ownership management
- Message routing between producers and consumers
- Load balancing across the cluster
Brokers are stateless, which means they can be added or removed from the cluster dynamically without data migration. All persistent state is stored in BookKeeper.
Apache BookKeeper (Storage Layer)
BookKeeper provides the persistent storage layer through a system of ledgers. The managed ledger abstraction (frommanaged-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedLedger.java) sits on top of BookKeeper:
- Ledgers: Immutable, append-only log segments
- Managed Ledgers: Named log streams that automatically manage multiple BookKeeper ledgers
- Cursors: Track consumer positions within managed ledgers
ManagedLedger automatically handles ledger rollover, garbage collection, and cursor management. Each Pulsar topic is backed by exactly one ManagedLedger.
Metadata Store
The metadata store coordinates the cluster and stores configuration. Fromconf/broker.conf, Pulsar supports:
- ZooKeeper: Traditional coordination service
- etcd: Alternative metadata store
- RocksDB: Embedded metadata store for single-node deployments
- Topic ownership assignments
- Namespace configurations
- Schema registry
- Cluster coordination
- Tenant and namespace metadata
Separation of Concerns
Pulsar’s architecture separates three critical concerns:Serving (Brokers)
Stateless brokers handle protocol, routing, and client connections. They can scale independently based on connection count and throughput.
Storage (BookKeeper)
BookKeeper bookies store messages durably. Storage can scale independently based on data retention requirements.
- Independent scaling: Scale serving and storage independently
- Fast failover: Brokers are stateless, so failover is instant
- No data rebalancing: Adding/removing brokers doesn’t require data movement
- Operational flexibility: Upgrade components independently
Message Flow
Here’s how messages flow through the system:High Availability
Pulsar achieves high availability through:- Broker redundancy: Multiple brokers share topic ownership
- BookKeeper quorum: Configurable replication (typically 3 copies)
- Metadata store quorum: ZooKeeper/etcd ensemble (3 or 5 nodes)
- Automatic failover: Topic ownership transfers automatically on broker failure
Configuration
Key broker configuration parameters fromconf/broker.conf:
Next Steps
Messaging Model
Learn about Pulsar’s messaging semantics
Topics
Understand how topics organize messages
Multi-Tenancy
Explore Pulsar’s tenant isolation features
Subscriptions
Deep dive into subscription types