- StreamNative Platform
- Concepts
Geo-replication
Geo-replication is a common mechanism that is used to provide disaster recovery in multi-datacenter deployments. This ensures that your data, and the systems that rely upon it, will be able to withstand any unforeseen disasters, such as earthquakes or fire. Apache Pulsar comes with geo-replication as a built-in feature. In Pulsar, geo-replication is automatically performed by Pulsar brokers and you can enable, disable, or dynamically change geo-replication configurations at runtime.
Traditionally, geo-replication mechanisms fall into synchronous geo-replication and asynchronous geo-replication. Pulsar supports both synchronous geo-replication within a single Pulsar cluster and asynchronous geo-replication across multiple clusters.
Synchronous geo-replication
A synchronous geo-replicated Pulsar installation consists of a BookKeeper cluster, a broker cluster, and a single global Zookeeper node running across all regions.
In synchronous geo-replication, when a client writes data to a Pulsar cluster in one region, the data is written to multiple Bookies in all available regions within the same call. The write request is not acknowledged to the client unless all the data centers have confirmed that the data has been persisted.
Synchronous geo-replication provides better data consistency guarantees because the data is always synchronized across the data centers. But it also causes a higher cross-datacenter network latency. Therefore, synchronous geo-replication is good for mission-critical use cases that are able to tolerate a slightly higher publish latency.
Asynchronous geo-replication
An asynchronous geo-replicated Pulsar installation consists of two or more independent Pulsar clusters running in different regions. Each Pulsar cluster contains their own set of brokers, bookies, and ZooKeeper nodes that are completely isolated from one another.
In asynchronous geo-replication, when a client writes data to a Pulsar topic, the data is first persisted to the local Pulsar cluster. The producer receives a response immediately after the local cluster successfully persists the data. Then, the data is replicated asynchronously to Pulsar clusters in other regions.
Asynchronous geo-replication provides a lower network latency but may result in weaker data consistency guarantees because of the potential replication lag where some data hasn't been replicated.
For details about how to configure asynchronous geo-replication, see configure asynchronous geo-replication.
For detailed information about Pulsar geo-replication, see Pulsar documentation.