- Pulsar Guidelines
- Planning Guides
Get Started with Geo-Replication
Geo-replication is all about providing disaster tolerance by having different clusters with copies of your data across geographically distributed data centers. Apache Pulsar provides geo-replication out of the box without the need of external tools.
Geo-replication and Pulsar
Geo-replication is the replication of persistently stored message data across multiple clusters of a Pulsar instance.
- You must enable geo-replication on a per-tenant basis in Pulsar.
- You can enable geo-replication only when a tenant is created that allows access to both clusters. Geo-replication is managed at the namespace level.
When messages are produced on a Pulsar topic, messages are first persisted in the local cluster, and then forwarded asynchronously to the remote clusters.
Typically, messages are replicated immediately (at the same time as they are dispatched to local consumers). The network round-trip time (RTT) between the remote regions defines end-to-end delivery latency.
Replication and isolation policies
In large Pulsar installations, workloads often grow more complex, with applications that span regions or workloads that require a higher level SLA that may require dedicated hardware.
Pulsar includes functionality and policies to control geo-replication, as well as isolation primitives to give fine-grained control over where workloads run. These policies include:
- Geo-replication controls
- Controls over which brokers and bookies are used for a given namespace
- Controls to ensure that some workloads aren't placed together
Step 1: Design a geo-replication configuration
To properly configure geo-replication for your organization, it is important to decide on a design that maps to your use case.
Some common use cases include:
- Cross-site availability
- Multi-cluster load distribution
- Data aggregation
- Data migration
Step 2: Create a Pulsar instance
A Pulsar instance consists of multiple Pulsar clusters working in unison.
To deploy a multi-cluster Pulsar instance:
Deploy two separate ZooKeeper quorums:
- a local quorum for each cluster in the instance
- a configuration store quorum for instance-wide tasks
Initialize the cluster metadata for each cluster.
Step 3: Create the replication clusters
Before you can enable geo-replication on a cluster, you need to make sure the source cluster can "see" the replica cluster.
Create the clusters inside the Pulsar instance that you want to use for geo-replication.
Step 4: Grant tenant permission across the replication clusters
To replicate to a replica cluster, the tenant on source cluster needs permission to use that cluster.
Note: If you are using a global config store, you only need to grant permissions on any one of the clusters. Otherwise, you need to grant permissions on all source clusters.
Step 5: Enable geo-replication at the namespace level
To enable geo-replication namespaces, configure a namespace to replicate across two or more provisioned clusters. Once you create a geo-replicated namespace, any topics that producers or consumers create within that namespace is replicated across clusters.
When messages are produced on a Pulsar topic, messages are first persisted in the local cluster, and then forwarded asynchronously to the remote clusters. Typically, messages are replicated immediately (at the same time as they are dispatched to local consumers). The network round-trip time (RTT) between the remote regions defines end-to-end delivery latency.
Step 5: Configure geo-replication policies
Policies for geo-replication that you can configure include:
- Selective replication of messages - Use to restrict replication selectively by specifying a replication list for a message so the message is replicated only to the subset in the replication list.
- Replicated subscriptions - Use to keep subscription state in sync.
- Dispatch throttling - Use to specify the dispatch rate for the namespace and topic level.
- Follow the steps to configure asynchronous Geo-replication
- Watch the 3-part video series: Geo-Replication Overview, Geo-Replication Patterns, and How-to enable geo-replication.
- Read the deep dive into data placement policies.
- Review the Pulsar geo-replication documentation.