1. StreamNative Cloud
  2. Concepts

Concepts

StreamNative Cloud is the industry’s only fully-managed, cloud-native messaging and event streaming platform powered by Apache Pulsar. Apache Pulsar is an open-source, distributed pub/sub messaging and event streaming platform that enables industry leaders globally to build pub/sub messaging and event-driven applications at scale. Built and operated by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative Cloud provides a scalable, resilient, and secure messaging and event streaming platform for enterprises. You can sign up for StreamNative Cloud through the StreamNative website to create and manage StreamNative Cloud resources and Pulsar components.

StreamNative Cloud resources

Organizations

Organizations are intended for use in environments with many users spread across multiple teams. They are used to divide cluster resources between multiple users (through resource quotas). Names of resources must be unique within an organization. Organizations cannot be nested and each snctl resource can only be in one organization.

When you sign up for service with StreamNative Cloud, you provide a descriptive name for your first organization. A system-generated, random string is also assigned to your organization upon creation. You can see the random string next to the descriptive name on the Dashboard, as shown in the figure below.

a screenshot image of the dashboard showing the organization name

The Pulsar clusters and other resources are owned by your organization. Organizations are team-based. As an organization administrator, you control access to the organization by adding and removing members and by granting permissions to them. Currently, you can't delete an organization through either the StreamNative Cloud Console or the CLI. If you need to delete an organization, please submit a ticket.

For details about how to create an organization, see work with organizations.

Users

Users are identified by their email address and authenticated through social login, by a username/password combination, or through SSO. As an organization administrator, you invite users to an organization, and they receive an email to complete the registration. For details about how to create or invite users, see work with users.

Service accounts

Service accounts may be created for automation purposes, such as to authenticate bots that operate on your organization. For example, a service account could be used by a GitHub action or Jenkins job to automatically provision a Pulsar cluster. When you create a service account, you receive a JSON document called a key file that contains the secret credentials for the service account. It is your responsibility to protect the key file. The key file can be used to authenticate to both the Cloud API and to managed Pulsar clusters.

For details about how to create a service account, see work with service accounts.

Access control

The StreamNative Cloud uses role-based access control. As an organization administrator, you grant permission to access resources by assigning roles to users and to service accounts. Role assignments control access to the StreamNative Cloud API and to the Pulsar clusters that you provision.

Each organization has a built-in admin role, allowing full control of organization resources, including "Super Admin" access to the organization’s Pulsar clusters. Currently, all logged in users have the same "admin" level access.

Customized roles that allow for fine grained permissions will be supported in a future release. For example, to grant Super Admin permissions for a specific cluster (as opposed to all clusters).

Server pool

A server pool is an abstract definition of the compute, storage, and networking needed to host Pulsar instances. Currently, only shared and shared-aws server pools are available for StreamNative Cloud. The following table lists the relationship between the server pool where the instance is located and the location of clusters available for the instance.

Server poolDescriptionCluster location
sharedInstances are hosted on GCP.us-east4
shared-awsInstances are hosted on AWS.us-east-1

Instances

A Pulsar instance consists of one or more Pulsar clusters working in unison. Clusters can be distributed across geographical locations and can replicate amongst themselves using geo-replication. For details about how to work with instances, such as creating, editing, checking, and deleting instances, see work with instances.

In this release, only one cluster is available for each instance.

Subscription plan

A subscription is an agreement between you and StreamNative to pay for service on a particular schedule. In the current release, when you create an instance and a cluster on the StreamNative Cloud Console, you are automatically enrolled in the default pay-as-you-go subscription plan. For customers who have legacy clusters, submit a ticket to get assistance with moving your cluster to the updated subscription plan.

If you want to provision a cluster with snctl instead of on StreamNative Cloud Console, you’ll need to first create a subscription with snctl. For more information about snctl, see the snctl reference documentation.

For more information about viewing your invoices, what resources are included on your invoice, how to update your payment information, and more, see the billing documentation page.

Availability mode

With StreamNative Cloud, you can create a Pulsar cluster tailored to the availability requirements of your workload and your budget. The types of availability are regional and zonal.

  • A regional cluster has replicas running on numerous Availability Zones (AZs) within a given region. This arrangement maximizes availability but involves more inter-zone network traffic. In this release, three AZs are used per regional cluster.

  • A zonal cluster concentrates all replicas into a single AZ, has better performance and a lower cost, with reduced availability.

Clusters

A Pulsar cluster consists of a set of Pulsar brokers, ZooKeeper servers, and bookies in a geographical location. A Pulsar instance consists of one or more Pulsar clusters working in unison. Clusters can be distributed across geographical locations and can replicate amongst themselves using geo-replication.

The following table lists features and usage limits supported by the cluster.

Type Features Capability
Service Uptime SLA 99.95%
Single AZ Yes
Multi AZ Yes
Scale Throughput limit per topic Max 100 MBps
Storage limit per topic Max 1000 TB
Tenant limit Max 128
Namespace limit Max 1024
Topic limit Max 10240
Pulsar components Pub/Sub Yes
Cloud providers GCP Yes
AWS Yes
Monitoring & visibility Cloud Console Yes
Security At-rest & In-transit data encryption Yes
OAuth2 authentication Yes

For details about how to work with clusters, such as creating, editing, checking, and deleting clusters, see work with clusters.

Cluster location

Each Pulsar cluster is associated with a geographical location. The following locations are available:

IdentifierLocation
ap-south-1Asia Pacific (Mumbai)
ap-southeast-2Asia Pacific (Sydney)
eu-west-1Europe (Ireland)
us-east-1US East (N. Virginia)
us-east-2US East (Ohio)
us-east-4Ashburn, Northern Virginia, USA

Cluster features

This section describes features that can be enabled for a Pulsar cluster.

WebSocket

The Pulsar WebSocket API provides a simple way to interact with Pulsar using languages that do not have an official client library. Through the WebSocket API, you can use the WebSocket producer and consumer to produce and consume messages to and from a topic. For details, see connect to cluster through WebSocket API.

Kafka protocol

You can add the Kafka protocol to your Pulsar cluster. With Kafka protocol enabled, you can migrate your existing Kafka applications and services to Pulsar without modifying the code. Currently, this is a beta feature. If you want to try out this feature, submit a ticket with this request to the support team.

Audit log

Audit logs track and store authorization activities in Pulsar clusters, tenants, namespaces, and topics.

Cluster configurations

When you provision a Pulsar cluster, StreamNative Cloud automatically configures the security, networking, compute, and storage aspects of the cluster. Various scaling parameters are customizable based on your workload and your budget. These parameters stem from the Pulsar architecture and are specified on a per-component basis.

Broker parameters
ParameterDefault valueDescription
Replicas1Specify the number of brokers.
Node typetiny-1Specify the compute characteristics (CPU, memory) per broker.
BookKeeper parameters
ParameterDefault valueDescription
Replicas3Specify the number of bookies.
Node typetiny-1Specify the compute characteristics (CPU, memory) per bookie.
Backlog parameters
ParameterDefault valueDescription
Backlog Quota Size-1Specify the backlog quota size.
Backlog Retention PolicyN/ASpecify the retention policy when the backlog quota threshold is reached.
- producer_request_hold: the broker holds producers' sent requests until the resource becomes available or until the holding times out.
- producer_exception: the broker rejects producers' send requests.
- consumer_backlog_eviction: the broker evicts the oldest messages from the slowest consumer's backlog.

Throttling parameters

ParameterDefault valueDescription
Max Producers Per TopicN/ASpecify the maximum number of producers allowed to connect to a topic.
Max Consumers Per TopicN/ASpecify the maximum number of consumers allowed to connect to a topic.
Max Consumers Per SubscriptionN/ASpecify the maximum number of consumers allowed to connect to a subscription.

Dispatch rate parameters

ParameterDefault valueDescription
Dispatch Rate Per TopicN/ASpecify the topic-based dispatch rate.
- Throughput (bytes/second): specify the topic-based dispatch rate (in bytes/second).
- Rate (messages/second): specify the topic-based dispatch rate (in messages/second).
Dispatch Rate Per SubscriptionN/ASpecify the subscription-based dispatch rate.
- Throughput (bytes/second): specify the subscription-based dispatch rate (in bytes/second).
- Rate (messages/second): specify the subscription-based dispatch rate (in messages/second).
Subscribe Rate Per ConsumerN/ASpecify the subscribe rate for each consumer.
Rate (subscribes/second): specify the subscribe rate for each consumer (in subscribes/second).

Service URLs

Once provisioned, a Pulsar cluster is accessible through service URLs. A service URL is a HTTPS endpoint that is exposed to the Internet and protected by OAuth 2.0. The Fully Qualified Domain Name (FQDN) of a service endpoint is based on the name of the Pulsar cluster. For details about how to get service URLs, see get a service URL.

URN

An instance is identified by a Uniform Resource Name (URN). The format of instance URN is "urn:sn:pulsar:pulsar-instance-namespace:pulsar-instance-name". You can get the organization name and the instance name through the snctl get organizations and snctl get pulsarinstance commands. When a Pulsar client connects to a Pulsar cluster through the OAuth2 authentication plugin, the URN is a required field for OAuth2 authentication.

Currently, Flink SQL is only available for the standard subscription plan through snctl.

Flink SQL enables you to execute interactive SQL queries against Pulsar within StreamNative Cloud. You can use the SN Cloud API to provision the computing resources, including a Flink cluster and an SQL gateway for submitting queries to backend Flink cluster through the StreamNative Cloud Console.

In StreamNative Cloud, Flink SQL and Pulsar are integrated seamlessly to provide an end to end solution for real-time data exploration. The solution uses Flink’s Catalog API and Pulsar Schema Registry to expose Pulsar topics as Flink tables. There are a few core abstractions to understand:

  • Catalog: a catalog is a collection of databases. It is mapped to an existing Pulsar cluster.

  • Database: a database is a collection of tables. It is mapped to a namespace in Apache Pulsar. Each namespace within a Pulsar cluster is treated as a Flink database. Databases can also be created or deleted via Data Definition Language (DDL) statements; an associated Pulsar namespace will be created or deleted.

  • Table: a Pulsar topic can be presented as a STREAMING table or an UPSERT table.

  • Schema: the schema of a Pulsar topic will be automatically mapped as Flink table schema if the topic already exists with a schema. If a Pulsar topic does not exist, creating a table through a DDL statement will automatically create a Pulsar schema.

  • Metadata columns: The message metadata and properties of a Pulsar message will be mapped into the metadata columns of a Flink table. Here are supported metadata columns:

    • messageId: the message ID of a Pulsar message (read-only).
    • sequenceId: the sequence ID of a Pulsar message (read-only).
    • publishTime: the publish timestamp of a Pulsar message (read-only).
    • eventTime: the event timestamp of a Pulsar message (read or write).
    • properties: the message properties of a Pulsar message (read or write).

Flink SQL also ensures the system security while running submitted queries in these ways:

  • Use OAuth2 authentication for connecting to the Flink cluster.
  • Use TLS to protect the endpoints.
  • Use the user's role to access Pulsar topics and namespaces.

Here are the common operations that you can perform in the interactive SQL scenario.

  1. Create a Pulsar instance and Pulsar cluster.

  2. Create a service account that has the super-admin permission for the Pulsar cluster and download the key file of the service account to your local computer.

  3. Create a Flink cluster. Then, the Flink cluster is automatically associated with the Pulsar cluster.

  4. Connect to the Pulsar cluster through the OAuth2 authentication plugin.

  5. Load data to the tables in the target Flink database.

  6. Execute interactive queries with SQL.

  7. Check query results in the same SQL editor window to verify the results.

For details about how to create a Flink cluster, execute interactive queries, and check query results, see Flink SQL.

Pulsar components

Tenants

Pulsar was created from the ground up as a multi-tenant system. To support multi-tenancy, Pulsar has a concept of tenants. Tenants can be spread across clusters and can each have their own authentication and authorization scheme applied to them. They are also the administrative unit at which storage quotas, message TTL, and isolation policies can be managed.

For details about how to create tenants, see work with tenant.

Namespaces

A namespace represents an administrative unit within a tenant. The configuration policies set on a namespace apply to all the topics created in that namespace. You can create multiple namespaces for a tenant using the StreamNative Cloud Console, REST API or the pulsar-admin CLI tool.

For details about how to create namespaces, see work with namespace.

Permissions

In Pulsar, permissions are managed at the namespace level (within tenants and clusters). You can grant permissions to specific users for lists of operations such as produce and consume. In addition, you can revoke permissions from specific users, which means that those users cannot access the specified namespace.

Backlog quotas

Backlogs are sets of unacknowledged messages for a topic that have been stored by bookies. Pulsar stores all unacknowledged messages in backlogs until they are processed and acknowledged. You can use the backlog quotas to control the allowable size of backlogs at the namespace level. You can set the following items for a backlog quota:

  • an allowable size threshold for each topic in the namespace
  • a retention policy that determines the action the broker takes if the threshold is exceeded.

The following table lists available retention policies.

PolicyAction
producer_request_holdThe broker holds but does not persist producers' request payload.
producer_exceptionThe broker disconnects from the client by throwing an exception.
consumer_backlog_evictionThe broker begins discarding backlog messages.

Bundle

For assignment, a namespace is sharded into a list of bundles, with each bundle comprising a portion of the overall hash range of the namespace. A bundle is a virtual group of topics that belong to the same namespace. A namespace bundle is defined as a range between two 32-bit hashes, such as 0x00000000 and 0xffffffff. By default, four bundles are supported for each namespace.

Since the load for topics in a bundle might change over time, one bundle can be split into two bundles by brokers. Then, the new smaller bundle is reassigned to different brokers. By default, the newly split bundles are immediately offloaded to other brokers to facilitate the traffic distribution.

Dispatch rate

Dispatch rate refers to the number of messages dispatched per second by topics for a namespace. Dispatch rate can be restricted by the number of messages per second (msg-dispatch-rate) or by the number of bytes of messages per second (byte-dispatch-rate). Dispatch rate is in seconds and it can be configured with dispatch-rate-period. By default, msg-dispatch-rate and byte-dispatch-rate are both set to -1, which indicates that throttling is disabled.

Topics

As in other pub-sub systems, topics in Pulsar are named channels for transmitting messages from producers to consumers. Pulsar supports persistent and non-persistent topics. By default, a persistent topic is created if you do not specify a topic type. With persistent topics, all messages are durably persisted on disks (if the broker is not standalone, messages are durably persisted on multiple disks), whereas data for non-persistent topics is not persisted to storage disks.

For more information about topics, see topics.

Non-persistent topics

Pulsar also supports non-persistent topics, which are topics on which messages are never persisted to disk and live only in memory. When using non-persistent delivery, killing a Pulsar broker or disconnecting a subscriber to a topic means that all in-transit messages are lost on that non-persistent topic. In non-persistent topics, brokers immediately deliver messages to all connected subscribers without persisting them in BookKeeper.

For more information about non-persistent topics, see non-persistent topics.

Partitioned topics

Normal topics are served only by a single broker that limits the maximum throughput of the topic. Partitioned topics are a special type of topic that are handled by multiple brokers, thus allowing for higher throughput. A partitioned topic is actually implemented as N internal topics, where N is the number of partitions. When publishing messages to a partitioned topic, each message is routed to one of several brokers. The distribution of partitions across brokers is handled automatically by Pulsar. It's recommended to have at least one partition per topic so that you can add more partitions in the future. If there are zero partitions (a non-partitioned topic), you will not be able add more partitions to the topic after it is created. You can have from 1 up to 100 partitions per topic.

For more information about partitioned topics, see partitioned topics.

Subscriptions

A subscription is a named configuration rule that determines how messages are delivered to consumers. Four subscription modes are available in Pulsar: exclusive, shared, failover, and key_shared.

For more information about subscription modes, see subscriptions.

Schema

Pulsar has a built-in schema registry that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic. Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like string to more complex application-specific types.

Previous
Overview