1. Pulsar Guidelines
  2. Messaging guidelines

Pulsar Storage Model

Pulsar uses a storage model that uses a combination of distributed, durable, and scalable storage components that work together to provide efficient and reliable message storage and processing.

Apache BookKeeper

BookKeeper is a distributed write-ahead log (WAL) system or distributed journal. Pulsar uses Apache BookKeeper for persistent message storage. By default, Pulsar persistently stores all unacknowledged messages on multiple BookKeeper bookies (storage nodes).

BookKeeper is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads.

How Pulsar Scales

Pulsar's cloud-native architecture leverages the benefits of the cloud with rapid and automated scale-out. It is designed for failure management, automatic recovery, and built-in high-availability. All this is possible because Pulsar decouples compute and storage.

There is more than one type of scaling. It's not just about more throughput. Think about this for a moment. If you need to have high throughput and a large amount of storage, and then support this across tens of thousands or even hundreds of thousands of topics, you are pulling the system on different axises and in different directions.

Pulsar can handle all of these scenarios by using different techniques.

Horizontal Scaling

Horizontal scaling (or "scaling out") refers to adding additional nodes or machines to your infrastructure to cope with new demands.

  • To support more data storage, you add more bookies. When you add more bookies, you get more disk space.
  • To support more throughput on a topic, you add more partitions. For example, if you have three topics on a partition but need to double the throughput, you just need to double the number of partitions.
  • To support more regions or to support resource isolation, you add more clusters. You can add another cluster to your Pulsar instance, and then expand to multiple data centers or isolate certain workloads.

Vertical Scaling

Vertical scaling (or "scaling up") refers to adding additional resources to a system so that it meets the demand for more capacity or power.

  • To support more throughput, you add faster BookKeeper disks. For example, you can increase the throughput on a single topic by moving from a disk with 10,000 IOPS to 50,000 IOPS.
  • To support more topics, you add more brokers. For example, you can handle more topics by increasing from 2 CPUs and 4 GB of memory to 8 CPUs and 64 GB of memory.

Scaling with a smart design

In addition to horizontal and vertical scaling, you can scale to meet the needs of an organization by using key Pulsar features.

  • Use multi-tenancy to support more teams and applications on a cluster.
  • Use much more cost-effective tiered storage to offload storage from the BookKeeper tier.
  • Use topic bundles to make topics cheaper (and thereby allow for more topics).
  • Allow producers and consumers to multiplex connections to support more topics at the client side.

Common use cases for scaling

Designing your Pulsar deployment includes considering your scaling needs. There is no single way to scale a Pulsar cluster. Scaling is highly dependent on usage patterns and Pulsar provides flexibility to tailor the solution to exactly meet your needs.

Scenario 1: High amount of tailing-reads and fanout

If you are doing a lot of tailing reads and fanout, you can add more brokers to keep up. For example, you have a lot of consumers at the end of the stream and you are distributing one message to hundreds or even thousands of consumers. In this case, the data will be consumed from the brokers' memory and not from the bookies. Messages are cached in memory and fetched from brokers. There is no need to get messages from bookies.

Scenario 2: Lots of catch-up reads

If you have a lot of catch-up reads, you can horizontally scale out your bookies (add more) and add more memory to your existing bookies (vertical scale) for better read cache. For example, if you are running a daily job or weekly job that sends a lot of data, you are going to need more bookies to store that data.

Scenario 3: Faster write throughput

If you are interested primarily in write throughput, you can add bookies with fast disks but less memory. For example, you are using Pulsar primarily to sync messages. You are writing large amounts of data with huge numbers of messages that may or may not be read. In this case, it's more important to have a very fast bookie tier on the journal disk, and bookie memory is not as important.

What are proxies?

A Pulsar proxy is an optional component you can put in front of your brokers. The producers and consumers talk directly to the proxy and the proxy handles the routing of all the connections back to the underlying broker pool. You can scale the Pulsar brokers more dynamically without a direct connection.

Next steps

Pulsar Architecture and Design