Are you new to StreamNative? Trying to learn and understand? Listed below are terms and concepts relevant to understanding StreamNative products.
This Glossary is a work-in-progress. If you have feedback about terms and definitions you'd like to see included in this glossary, please email us.
BookKeeper is a distributed write-ahead log (WAL) system or distributed journal. Pulsar uses Apache BookKeeper for persistent message storage. By default, Pulsar persistently stores all unacknowledged messages on multiple BookKeeper bookies (storage nodes).
BookKeeper is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads.
A Broker receives messages from producers, stores them and then delivers them to subscribed consumers.
A broker is the message dispatcher responsible for sending and receiving messages from a client. You typically have multiple brokers in a cluster, so if messages get backed up or a broker goes down, another broker can take on the extra load. This transfer can happen quickly due to the stateless nature of brokers.
A cluster is a secure messaging environment within Pulsar. Each Pulsar cluster consists of 3 components:
- Pulsar brokers - set of brokers handling all the data going in and out of Pulsar (or client requests)
- Metadata storage - providing coordination and service discovery between services
- Bookie ensemble - set of bookies that retain copies of the messages
A cluster has two layers: a stateless serving layer (made up of brokers) and a stateful storage layer (made up of bookies). See also Pulsar Architecture and design.
In StreamNative Cloud, you create one and only one cluster for an instance.
A consumer processes incoming messages and takes action based on the content of the message. In Pulsar, messages are sent to a specific topic, which is a logical name for a stream of data. A consumer subscribes to a topic and receives all messages published to that topic.
To receive a message, the consumer needs to send a request to the broker that handles this message. The message is dispatched when the client permits the broker to push it. Typically, the consumer uses a queue to accumulate the messages to consume (you can configure the receiverQueueSize).
Pulsar manages the subscription cursor which determines the starting position to read data for consumers. Specifically, consumers read from the earliest unacknowledged message. If you need to manually manage the cursor and customize the starting position for consumers, go to readers.
Subscriptions use cursors to manage messages. Pulsar can go back to a specific message using a cursor. A cursor is a "restart" point. Each subscription for a topic has a cursor. The cursor contains information about message acknowledgements. When a consumer reads and processes a message, it sends an acknowledgment to the Pulsar broker and the cursor is updated. Updating the cursor ensures that the consumer will not receive that message again — even when the consumer crashes, recovers, and reattaches to the subscription.
Geo-replication is the replication of persistently stored message data across multiple clusters of a Pulsar instance.
You can produce and consume messages in different geo-locations. For example, your application may be publishing data in one region or market and you would like to process it for consumption in other regions or markets. Geo-replication in Pulsar enables you to do that.
An instance is a group of Pulsar clusters that act together as a single unit.
Multi-tenancy allows you to support multiple organizations (or sub-organizations) within your company on a single platform. A single Pulsar cluster can support many tenants and allows you to map Pulsar topics to different teams, applications, or use cases. This hierarchical structure serves as the foundation of security and allows for unified, global management of multiple clusters.
The underlying components for multi-tenancy include:
- Instance - group of Pulsar clusters that act together as a single unit.
- Cluster - a group of brokers and bookies that create a secure messaging environment within Pulsar.
- Tenant - the administrative unit within a shared environment.
- Namespace - a grouping mechanism for related topics.
- Topic - a unit of storage that structures data in Pulsar and organizes messages into a stream.
See Multi-tenancy get started guide or watch a short video to learn more about multi-tenancy in Pulsar.
A namespace is a grouping mechanism for related topics. It allows teams to keep their data and teams separate. Each namespace has its own policies.
You create a separate namespace for each application. A namespace allows the application to create and manage a hierarchy of topics. You can create any number of topics under the namespace.
For example, the topic my-tenant/app1 is a namespace for the application "app1" for "my-tenant".
A producer sends messages to a Pulsar topic.
Pulsar producers play a crucial role in the Pulsar messaging system by generating and publishing messages to topics, which can be consumed by one or more consumers for various use cases such as real-time event streaming, messaging, and data processing.
You create producers using various programming languages such as Java, Python, Go, and C++.
Producers can operate in different modes such as synchornous or asynchornous publishing.
- In synchronous mode, the producer blocks until the broker acknowledges receipt of the message.
- In asynchronous mode, the producer sends messages in the background, and the application can continue executing without waiting for acknowledgments.
Pulsar producers also support various features such as message batching, compression, and encryption, which allow for efficient and secure message transmission.
The publish-subscribe (pub-sub) software design pattern provides a framework for exchanging messages between the sender of messages (publishers) and receivers of messages (subscribers).
In Pulsar, a publisher is called a producer and a subscriber is called a consumer.
View the Pub-Sub Animation to step through how the pub-sub model works.
A subscription is the binding between a topic (or a partition) and a consumer. It is a named configuration rule that determines how messages are delivered to consumers.
Consumers register their interest in a topic by creating a subscription. A topic can have multiple attached subscriptions.
In Pulsar, you have flexibility to use four different subscription modes:
- Exclusive - allow only a single consumer to be connected at a time.
- Failover - allows multiple consumers to subscribe to the same topic (in the case of failover)
- Shared - allows multiple consumers to attach to the same subscription.
- Key Shared - distributes messages using an ordering key.
For more information about subscriptions and subscription modes, see Pulsar messaging model
A tenant is the administrative unit within a shared environment. You manage storage quotas, message TTL, and set isolation policies with tenants.
A tenant also provides a security boundary. You can spread tenants across clusters and apply an authentication and authorization scheme to each one. You can also isolate tenants to different clusters.
Tiered storage is a storage architecture that uses multiple levels of storage media to optimize data management. Pulsar provides access to tiered storage for infinite message retention (without the need for external tools).
Instead of using your fast disks for historical data, you can leverage the use of third party cloud storage systems and move the data from BookKeeper into a more cost effective storage tier. Pulsar clients can still access the data, making the storage of huge volumes of data in Pulsar manageable by reducing operational burden and cost.
For more information, see supported object storage solutions.
A topic is a unit of storage that structures data in Pulsar and organizes messages into a stream.
You must provide a unique name for a topic. Topics are named using a URI structure to fully qualify the name in the form of:
namespace are the organizational units in the multi-tenancy model and the
topic is an arbitrary string (usually named with alphanumeric and characters such as "_" or "-").
Pulsar creates a topic under the namespace provided in the topic name automatically. You do not need to explicitly create topics. If no tenant or namespace is specified when a client creates a topic, the topic is created in the default tenant and namespace.
Pulsar also supports non-persistent topics, which are topics on which messages are never persisted to disk and live only in memory. When using non-persistent delivery, stopping a Pulsar broker or disconnecting a subscriber to a topic means that all in-transit messages are lost on that non-persistent topic. In non-persistent topics, brokers immediately deliver messages to all connected subscribers without persisting them in BookKeeper.
Normal topics are served only by a single broker that limits the maximum throughput of the topic. Partitioned topics are a special type of topic that are handled by multiple brokers, allowing for higher throughput.
A partitioned topic is actually implemented as N internal topics, where N is the number of partitions. When publishing messages to a partitioned topic, each message is routed to one of several brokers. The distribution of partitions across brokers is handled automatically by Pulsar.
It's recommended to have at least one partition per topic so that you can add more partitions in the future. If there are zero partitions (a non-partitioned topic), you will not be able add more partitions to the topic after it is created. You can have from 1 up to 100 partitions per topic.
Pulsar uses a distributed architecture where topics are partitioned across multiple brokers for scalability and fault-tolerance.
A topic bundle is a group of topics that are assigned to a single broker for handling. To achieve this, Pulsar divides the topics into bundles, where each bundle is assigned to a specific broker. This ensures that the load is evenly distributed across brokers and that each broker is responsible for a subset of topics.
When a new topic is created, Pulsar assigns it to a specific bundle based on the topic name and the number of bundles configured for the cluster. If the number of bundles changes, Pulsar will rebalance the topics across the new set of bundles.
Topic bundles are an important concept in Pulsar as they play a crucial role in ensuring that topics are evenly distributed across brokers and that the cluster is scalable and fault-tolerant.