- Pulsar Guidelines
- Messaging guidelines
Pulsar Architecture and Design
Pulsar provides a "turn-key" architecture. You don't have to build complex tooling outside your cluster to make it work the way you want. The majority of everything you need to do is built into the system directly. In this way, Pulsar lowers complexity because it is able to take care of a lot of things for you.
Pulsar's multi-layered architecture separates the message storage layer from message serving layer. This decoupling of the storage and serving layers, provides flexibility that allows Pulsar to map to a broad set of use cases and to dynamically scale without any downtime.
The three layers of the Pulsar architecture, from top to bottom, are:
- API Layer (or serving layer)
- Compute layer (physical disk, RAM, CPU)
- Storage layer (message retention)
See also: multi-layer architecture video.
At the core of the Pulsar architecture is the cluster. A Pulsar cluster consists of three main components:
- Pulsar brokers
- MetaData storage
- Apache BookKeeper
You can see the relationship between these components in the image below.
When you first deploy a Pulsar cluster, all of these components are included. You don't have to manage each of these separately.
Brokers are the components handling all the data going in and out of Pulsar. Brokers are handling message routing and connections. They are stateless, so you can add more on demand.
A ZooKeeper quorum (or other storage option), provides the cluster-level configuration, coordination, and service discovery. It stores metadata for both Pulsar and BookKeeper.
Bookies store both messages and the cursors (cursors are parts of the subscription). Messages are grouped in segments (also known as ledgers). A group of bookies forms an "ensemble" to store a ledger. This ensemble is the set of bookies that you distribute your data across allowing for data resiliency, HA, and durability.
Multi-tenancy allows you to support multiple organizations (or sub-organizations) within your company on a single platform. A single Pulsar cluster can support many tenants and allows you to map Pulsar topics to different teams, applications, or use cases. This hierarchical structure serves as the foundation of security and allows for unified, global management of multiple clusters.
The underlying components for multi-tenancy include:
- Instance - group of Pulsar clusters that act together as a single unit.
- Cluster - a group of brokers and bookies that create a secure messaging environment within Pulsar.
- Tenant - the administrative unit within a shared environment.
- Namespace - a grouping mechanism for related topics.
- Topic - a unit of storage that structures data in Pulsar and organizes messages into a stream.
See Multi-tenancy get started guide or watch a short video to learn more about multi-tenancy in Pulsar.
The Pulsar API Design supports different layers of API access. The pub-sub API is the core API your applications make use of when interacting with Pulsar.
The Pulsar design can accommodate organizations of all sizes and scales to meet the various needs across different teams. For example:
- Application Teams - Pulsar provides self-service for onboarding, adding new apps, and customizing for use cases.
- Data and Platform Teams - Pulsar integrates with existing systems and applications and delivers high-throughput historical reads.
- Cluster Operators - Pulsar provides easy administration, resource isolation, clear visibility, and usage standards.
The Pulsar API Design supports different layers of API access.
For a quick overview, watch the Pulsar API design video.
The Pulsar ecosystem consists of a number of components that allow Pulsar to interoperate with other systems and extend its functionality.
- Functions provide lightweight stream processing that offers a way to run programs with Pulsar. Functions take care of boilerplate code so the developer can focus on the business problem instead of the code.
- Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. Connectors are built on top of Pulsar Functions. See also StreamNative Hub
- Protocol handlers allow Pulsar to "speak" with different protocols. These handlers are implemented via a plugin and run inside the Pulsar cluster. You do not need to deploy a separate component.
- Processing engines provide adapters for the most common processing engines like Flink, Spark and Pulsar SQL (Trino).
- Offloaders allow offloading data to cloud storage using existing Pulsar APIs. Tiered Storage makes use of offloaders to move data out of BookKeeper and into cost-effective cloud storage. (Current offloaders include S3 and S3 compatible, GCS, Azure Blob Storage, HDFS, and file systems.)
As shown in the diagram, StreamNative leverages the Pulsar ecosystem to deliver both stream storage and stream compute for a complete end-to-end streaming solution.
- Set up multi-tenancy for your organization
- Learn about Pulsar messaging model.
- Learn about Pulsar storage model.