1. Pulsar Guidelines
  2. Messaging guidelines

Pulsar Architecture and Design

Pulsar provides a "turn-key" architecture. You don't have to build complex tooling outside your cluster to make it work the way you want. The majority of everything you need to do is built into the system directly. In this way, Pulsar lowers complexity because it is able to take care of a lot of things for you.

Multi-layered architecture

Pulsar's multi-layered architecture separates the message storage layer from message serving layer. This decoupling of the storage and serving layers, provides flexibility that allows Pulsar to map to a broad set of use cases and to dynamically scale without any downtime.

The three layers of the Pulsar architecture, from top to bottom, are:

  • API Layer (or serving layer)
  • Compute layer (physical disk, RAM, CPU)
  • Storage layer (message retention)

See also: multi-layer architecture video.

Architecture components

At the core of the Pulsar architecture is the cluster. A Pulsar cluster consists of three main components:

  • Pulsar brokers
  • MetaData storage
  • Apache BookKeeper

You can see the relationship between these components in the image below. Architecture

When you first deploy a Pulsar cluster, all of these components are included. You don't have to manage each of these separately.

Pulsar brokers

Brokers are the components handling all the data going in and out of Pulsar. Brokers are handling message routing and connections. They are stateless, so you can add more on demand.

MetaData storage

A ZooKeeper quorum (or other storage option), provides the cluster-level configuration, coordination, and service discovery. It stores metadata for both Pulsar and BookKeeper.

Apache BookKeeper

Bookies store both messages and the cursors (cursors are parts of the subscription). Messages are grouped in segments (also known as ledgers). A group of bookies forms an "ensemble" to store a ledger. This ensemble is the set of bookies that you distribute your data across allowing for data resiliency, HA, and durability.

Multi-Tenancy

Multi-tenancy allows you to support multiple organizations (or sub-organizations) within your company on a single platform. A single Pulsar cluster can support many tenants and allows you to map Pulsar topics to different teams, applications, or use cases. This hierarchical structure serves as the foundation of security and allows for unified, global management of multiple clusters.

The underlying components for multi-tenancy include:

  • Instance - group of Pulsar clusters that act together as a single unit.
  • Cluster - a group of brokers and bookies that create a secure messaging environment within Pulsar.
  • Tenant - the administrative unit within a shared environment.
  • Namespace - a grouping mechanism for related topics.
  • Topic - a unit of storage that structures data in Pulsar and organizes messages into a stream.

See Multi-tenancy get started guide or watch a short video to learn more about multi-tenancy in Pulsar.

API design

The Pulsar API Design supports different layers of API access. The pub-sub API is the core API your applications make use of when interacting with Pulsar.

The Pulsar design can accommodate organizations of all sizes and scales to meet the various needs across different teams. For example:

  • Application Teams - Pulsar provides self-service for onboarding, adding new apps, and customizing for use cases.
  • Data and Platform Teams - Pulsar integrates with existing systems and applications and delivers high-throughput historical reads.
  • Cluster Operators - Pulsar provides easy administration, resource isolation, clear visibility, and usage standards.

The Pulsar API Design supports different layers of API access. API Design

For a quick overview, watch the Pulsar API design video.

Pulsar ecosystem

The Pulsar ecosystem consists of a number of components that allow Pulsar to interoperate with other systems and extend its functionality.

  • Functions provide lightweight stream processing that offers a way to run programs with Pulsar. Functions take care of boilerplate code so the developer can focus on the business problem instead of the code.
  • Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. Connectors are built on top of Pulsar Functions. See also StreamNative Hub
  • Protocol handlers allow Pulsar to "speak" with different protocols. These handlers are implemented via a plugin and run inside the Pulsar cluster. You do not need to deploy a separate component.
  • Processing engines provide adapters for the most common processing engines like Flink, Spark and Pulsar SQL (Trino).
  • Offloaders allow offloading data to cloud storage using existing Pulsar APIs. Tiered Storage makes use of offloaders to move data out of BookKeeper and into cost-effective cloud storage. (Current offloaders include S3 and S3 compatible, GCS, Azure Blob Storage, HDFS, and file systems.)

Pulsar Ecosystem end-to-end view

As shown in the diagram, StreamNative leverages the Pulsar ecosystem to deliver both stream storage and stream compute for a complete end-to-end streaming solution.

Next steps

Previous
Pulsar Messaging Model