1. Pulsar Guidelines
  2. Planning Guides

Get Started with Geo-Replication

Geo-replication is all about providing disaster tolerance by having different clusters with copies of your data across geographically distributed data centers. Apache Pulsar provides geo-replication out of the box without the need of external tools.

Geo-replication and Pulsar

Geo-replication is the replication of persistently stored message data across multiple clusters of a Pulsar instance.

  • You must enable geo-replication on a per-tenant basis in Pulsar.
  • You can enable geo-replication only when a tenant is created that allows access to both clusters. Geo-replication is managed at the namespace level.

When messages are produced on a Pulsar topic, messages are first persisted in the local cluster, and then forwarded asynchronously to the remote clusters.

Typically, messages are replicated immediately (at the same time as they are dispatched to local consumers). The network round-trip time (RTT) between the remote regions defines end-to-end delivery latency.

Replication and isolation policies

In large Pulsar installations, workloads often grow more complex, with applications that span regions or workloads that require a higher level SLA that may require dedicated hardware.

Pulsar includes functionality and policies to control geo-replication, as well as isolation primitives to give fine-grained control over where workloads run. These policies include:

  • Geo-replication controls
  • Controls over which brokers and bookies are used for a given namespace
  • Controls to ensure that some workloads aren't placed together

QuickStart

Step 1: Design a geo-replication configuration

To properly configure geo-replication for your organization, it is important to decide on a design that maps to your use case.

Some common use cases include:

  • Cross-site availability
  • Multi-cluster load distribution
  • Fail-over
  • Data aggregation
  • Data migration

Step 2: Create a Pulsar instance

A Pulsar instance consists of multiple Pulsar clusters working in unison.

To deploy a multi-cluster Pulsar instance:

  • Deploy two separate ZooKeeper quorums:

    • a local quorum for each cluster in the instance
    • a configuration store quorum for instance-wide tasks
  • Initialize the cluster metadata for each cluster.

Step 3: Create the replication clusters

Before you can enable geo-replication on a cluster, you need to make sure the source cluster can "see" the replica cluster.

Create the clusters inside the Pulsar instance that you want to use for geo-replication.

Step 4: Grant tenant permission across the replication clusters

To replicate to a replica cluster, the tenant on source cluster needs permission to use that cluster.

Note: If you are using a global config store, you only need to grant permissions on any one of the clusters. Otherwise, you need to grant permissions on all source clusters.

Step 5: Enable geo-replication at the namespace level

To enable geo-replication namespaces, configure a namespace to replicate across two or more provisioned clusters. Once you create a geo-replicated namespace, any topics that producers or consumers create within that namespace is replicated across clusters.

When messages are produced on a Pulsar topic, messages are first persisted in the local cluster, and then forwarded asynchronously to the remote clusters. Typically, messages are replicated immediately (at the same time as they are dispatched to local consumers). The network round-trip time (RTT) between the remote regions defines end-to-end delivery latency.

Step 5: Configure geo-replication policies

Policies for geo-replication that you can configure include:

Next steps

Previous
Plan Multi-Tenancy