- Release Notes & References
Glossary of Terms & Concepts
Are you new to StreamNative? Trying to learn and understand? Listed below are terms and concepts relevant to understanding StreamNative products.
Note
If you have feedback about terms and definitions you'd like to see included in this glossary, please email us.
Apache BookKeeper
BookKeeper is a distributed write-ahead log (WAL) system or distributed journal. Pulsar uses Apache BookKeeper for persistent message storage. By default, Pulsar persistently stores all unacknowledged messages on multiple BookKeeper bookies (storage nodes).
BookKeeper is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads.
Broker
A Broker receives messages from producers, stores them and then delivers them to subscribed consumers.
A broker is the message dispatcher responsible for sending and receiving messages from a client. You typically have multiple brokers in a cluster, so if messages get backed up or a broker goes down, another broker can take on the extra load. This transfer can happen quickly due to the stateless nature of brokers.
Cluster
A cluster is a secure messaging environment within Pulsar. Each Pulsar cluster consists a set of 3 components in a geographical location.
- Pulsar brokers - set of brokers handling all the data going in and out of Pulsar (or client requests)
- Metadata storage - providing coordination and service discovery between services
- Bookie ensemble - set of bookies that retain copies of the messages
A cluster has two layers: a stateless serving layer (made up of brokers) and a stateful storage layer (made up of bookies). See also Pulsar Architecture and Design.
Note
In StreamNative Console, you can create one and only one cluster for an instance.
Consumer
A consumer processes incoming messages and takes action based on the content of the message. In Pulsar, messages are sent to a specific topic, which is a logical name for a stream of data. A consumer subscribes to a topic and receives all messages published to that topic.
To receive a message, the consumer needs to send a request to the broker that handles this message. The message is dispatched when the client permits the broker to push it. Typically, the consumer uses a queue to accumulate the messages to consume (you can configure the receiverQueueSize).
Pulsar manages the subscription cursor which determines the starting position to read data for consumers. Specifically, consumers read from the earliest unacknowledged message. If you need to manually manage the cursor and customize the starting position for consumers, go to readers.
Cursor
Subscriptions use cursors to manage messages. Pulsar can go back to a specific message using a cursor. A cursor is a "restart" point. Each subscription for a topic has a cursor. The cursor contains information about message acknowledgements. When a consumer reads and processes a message, it sends an acknowledgment to the Pulsar broker and the cursor is updated. Updating the cursor ensures that the consumer will not receive that message again — even when the consumer crashes, recovers, and reattaches to the subscription.
Geo-Replication
Geo-replication is the replication of persistently stored message data across multiple clusters of a Pulsar instance.
You can produce and consume messages in different geo-locations. For example, your application may be publishing data in one region or market and you would like to process it for consumption in other regions or markets. Geo-replication in Pulsar enables you to do that.
Instance
A Pulsar instance is a group of Pulsar clusters that act together as a single unit. Clusters can be distributed across geographical locations and can replicate amongst themselves using geo-replication. For details about how to work with instances, such as creating, editing, checking, and deleting instances, see work with instances.
With StreamNative Cloud, you can create a regional Pulsar cluster. A regional cluster has replicas running on numerous Availability Zones (AZs) within a given region. This arrangement maximizes availability but involves more inter-zone network traffic. Currently, three AZs are used per regional cluster and StreamNative Cloud supports multi AZ only.
Multi-Tenancy
Multi-tenancy allows you to support multiple organizations (or sub-organizations) within your company on a single platform. A single Pulsar cluster can support many tenants and allows you to map Pulsar topics to different teams, applications, or use cases. This hierarchical structure serves as the foundation of security and allows for unified, global management of multiple clusters.
The underlying components for multi-tenancy include:
- Instance - group of Pulsar clusters that act together as a single unit.
- Cluster - a group of brokers and bookies that create a secure messaging environment within Pulsar.
- Tenant - the administrative unit within a shared environment.
- Namespace - a grouping mechanism for related topics.
- Topic - a unit of storage that structures data in Pulsar and organizes messages into a stream.
See Multi-tenancy Get Started Guide or watch a short video to learn more about multi-tenancy in Pulsar.
Namespace
A namespace is a grouping mechanism for related topics. It allows teams to keep their data and teams separate. Each namespace has its own policies.
You create a separate namespace for each application. A namespace allows the application to create and manage a hierarchy of topics. You can create any number of topics under the namespace. For example, the topic my-tenant/app1 is a namespace for the application "app1" for "my-tenant".
The configuration policies you set on a namespace apply to all the topics created in that namespace. You can create multiple namespaces for a tenant using the StreamNative Console, REST API or the pulsar-admin CLI tool.
For details about how to create namespaces, see work with namespace.
For more information about namespace details, including permissions, backlog quotas, retention policies, bundles, and dispatch rates, see the Concepts section.
Organization
In StreamNative Cloud, organizations are intended for use in environments with many users spread across multiple teams. They are used to divide cluster resources between multiple users (through resource quotas). Names of resources must be unique within an organization. Organizations cannot be nested and each snctl resource can only be in one organization.
When you sign up for service with StreamNative Cloud, you provide a descriptive name for your first organization. A system-generated, random string is also assigned to your organization upon creation. You can see the random string next to the descriptive name on the Dashboard, as shown in the figure below.
The Pulsar clusters and other resources are owned by your organization. Organizations are team-based. As an organization administrator, you control access to the organization by adding and removing members and by granting permissions to them. Currently, you can't delete an organization through either the StreamNative Console or the CLI. If you need to delete an organization, please submit a ticket.
For details about how to create an organization, see work with organizations.
Producer
A producer sends messages to a Pulsar topic.
Pulsar producers play a crucial role in the Pulsar messaging system by generating and publishing messages to topics, which can be consumed by one or more consumers for various use cases such as real-time event streaming, messaging, and data processing.
You create producers using various programming languages such as Java, Python, Go, and C++.
Producers can operate in different modes such as synchornous or asynchornous publishing.
- In synchronous mode, the producer blocks until the broker acknowledges receipt of the message.
- In asynchronous mode, the producer sends messages in the background, and the application can continue executing without waiting for acknowledgments.
Pulsar producers also support various features such as message batching, compression, and encryption, which allow for efficient and secure message transmission.
Publish-Subscribe Model
The publish-subscribe (pub-sub) software design pattern provides a framework for exchanging messages between the sender of messages (publishers) and receivers of messages (subscribers).
In Pulsar, a publisher is called a producer and a subscriber is called a consumer.
View the Pub-Sub Animation to step through how the pub-sub model works.
Schema
Pulsar has a built-in schema registry that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic. Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like string
to more complex application-specific types.
Server Pool
A server pool is an abstract definition of the compute, storage, and networking needed to host Pulsar instances. Currently, only shared
and shared-aws
server pools are available for StreamNative Cloud.
The following table lists the relationship between the server pool where the instance is located and the location of clusters available for the instance.
Server pool | Description | Cluster location |
---|---|---|
shared | Instances are hosted on Google Cloud. | asia-south1 , europe-west1 , europe-west3 , us-central1 , us-east4 , us-west1 |
shared-aws | Instances are hosted on AWS. | ap-south-1 , ap-southeast-2 , eu-central-1 , eu-west-1 , us-east-1 , us-east-2 |
Service Account
You can create service accounts to automate actions such as to authenticate bots that operate on your organization. For example, a GitHub action or Jenkins job can use a service account to automatically provision a Pulsar cluster.
When you create a service account, you receive a JSON document called a key file that contains the secret credentials for the service account. It is your responsibility to protect the key file. (You can use the key file to authenticate both the Cloud API and to managed Pulsar clusters.)
The StreamNative Console uses role-based access control. As an organization administrator, you grant permission to access resources by assigning roles to users and to service accounts. Role assignments control access to the StreamNative Cloud API and to the Pulsar clusters that you provision.
Each organization has a built-in admin
role, allowing full control of organization resources, including "Super Admin" access to the organization's Pulsar clusters. Currently, all logged-in users have the same "admin" level access.
For details about how to create a service account, see work with service accounts.
StreamNative Cloud
StreamNative Cloud is the industry's only fully-managed, cloud-native messaging and event streaming platform powered by Apache Pulsar. Apache Pulsar is an open-source, distributed pub/sub messaging and event streaming platform that enables industry leaders globally to build pub/sub messaging and event-driven applications at scale.
Built and operated by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative Cloud provides a scalable, resilient, and secure messaging and event streaming platform for enterprises. You can sign up for StreamNative Cloud through the StreamNative website to create and manage StreamNative Cloud resources and Pulsar components.
Subscription
A subscription is the binding between a topic (or a partition) and a consumer. It is a named configuration rule that determines how messages are delivered to consumers.
Consumers register their interest in a topic by creating a subscription. A topic can have multiple attached subscriptions.
In Pulsar, you have flexibility to use four different subscription modes:
- Exclusive - allows only a single consumer to be connected at a time.
- Failover - allows multiple consumers to subscribe to the same topic (in the case of failover)
- Shared - allows multiple consumers to attach to the same subscription.
- Key Shared - distributes messages using an ordering key.
For more information about subscriptions and subscription modes, see Pulsar messaging model.
Tenant
Pulsar was created from the ground up as a multi-tenant system. To support multi-tenancy, Pulsar has a concept of tenants. Tenants can be spread across clusters and can each have their own authentication and authorization scheme applied to them. A tenant is the administrative unit within a shared environment. You manage storage quotas, message TTL, and set isolation policies with tenants.
A tenant also provides a security boundary. You can spread tenants across clusters and apply an authentication and authorization scheme to each one. You can also isolate tenants to different clusters.
For details about how to create tenants, see work with tenant.
Tiered Storage
Tiered storage is a storage architecture that uses multiple levels of storage media to optimize data management. Pulsar provides access to tiered storage for infinite message retention (without the need for external tools).
Instead of using your fast disks for historical data, you can leverage the use of third party cloud storage systems and move the data from BookKeeper into a more cost effective storage tier. Pulsar clients can still access the data, making the storage of huge volumes of data in Pulsar manageable by reducing operational burden and cost.
For more information, see supported object storage solutions.
Topic
A topic is a unit of storage that structures data in Pulsar and organizes messages into a stream.
You must provide a unique name for a topic. Topics are named using a URI structure to fully qualify the name in the form of:
persistent://[tenant]/[namespace]/[topic]
where tenant
and namespace
are the organizational units in the multi-tenancy model and the topic
is an arbitrary string (usually named with alphanumeric and characters such as "_" or "-").
For example: persistent://[compliance]/[risk]/[risk-detection]
Pulsar creates a topic under the namespace provided in the topic name automatically. You do not need to explicitly create topics. If no tenant or namespace is specified when a client creates a topic, the topic is created in the default tenant and namespace.
Non-persistent topics
Pulsar also supports non-persistent topics, which are topics on which messages are never persisted to disk and live only in memory. When using non-persistent delivery, stopping a Pulsar broker or disconnecting a subscriber to a topic means that all in-transit messages are lost on that non-persistent topic. In non-persistent topics, brokers immediately deliver messages to all connected subscribers without persisting them in BookKeeper.
For more information about non-persistent topics, see non-persistent topics.
Partitioned topics
Normal topics are served only by a single broker that limits the maximum throughput of the topic. Partitioned topics are a special type of topic handled by multiple brokers, allowing for higher throughput.
A partitioned topic is actually implemented as N internal topics, where N is the number of partitions. When publishing messages to a partitioned topic, each message is routed to one of several brokers. The distribution of partitions across brokers is handled automatically by Pulsar.
It's recommended to have at least one partition per topic so that you can add more partitions in the future. If there are zero partitions (a non-partitioned topic), you will not be able to add more partitions to the topic after it is created.
For more information about partitioned topics, see partitioned topics.
Topic Bundle
Pulsar uses a distributed architecture where topics are partitioned across multiple brokers for scalability and fault-tolerance.
A topic bundle is a group of topics that are assigned to a single broker for handling. To achieve this, Pulsar divides the topics into bundles, where each bundle is assigned to a specific broker. This ensures that the load is evenly distributed across brokers and that each broker is responsible for a subset of topics.
When a new topic is created, Pulsar assigns it to a specific bundle based on the topic name and the number of bundles configured for the cluster. If the number of bundles changes, Pulsar will rebalance the topics across the new set of bundles.
Topic bundles are an important concept in Pulsar as they play a crucial role in ensuring that topics are evenly distributed across brokers and that the cluster is scalable and fault-tolerant.
User
In StreamNative Console, users are identified by their email address and authenticated through social login, by a username/password combination, or through SSO. As an organization administrator, you invite users to an organization, and they receive an email to complete the registration. For details about how to manage or invite a user, see work with users.
Uniform Resource Name (URN)
An instance is identified by a Uniform Resource Name (URN). The format of instance URN is "urn:sn:pulsar:pulsar-instance-namespace:pulsar-instance-name". You can get the organization name and the instance name through the snctl get organizations
and snctl get pulsarinstance
commands. When a Pulsar client connects to a Pulsar cluster through the OAuth2 authentication method, the URN is a required field for OAuth2 authentication.