1. Tools
  2. PCK

Overview

In Pulsar, all data is saved in the BookKeeper or the tiered storage and the metadata is saved in the ZooKeeper. Both the data and the metadata are managed by external systems.

image of Pulsar consistency check

The metadata and data operate in different systems. No matter whether the data is saved in the BookKeeper or is offloaded to the tiered storage, the data could be missing due to the Pulsar system being mishandled or someone's operations. The following cases can cause data inconsistency:

  • Bookie failures, including disk error, broken index, bookie shutdown, and so on.
  • Pulsar issues, such as deleting ledger data but keeping metadata in Zookeeper or tiered storage.
  • User operations, such as failing to delete topics, deleting ledger by bookie shell, decommitting bookie, deleting cookie, and formatting metadata.
  • Network failure, such as a timeout for writing ledger replicas.
  • Offload failure, such as failing to offload data or metadata.

While audit logs can be used to monitor user operations, they cannot help predict system failures or non-artificial situations. Using PCK can detect metadata and data mismatch.

What is PCK?

PCK is a CLI tool for checking the consistency between Pulsar metadata and data. It gets the topic metadata from Zookeeper and checks whether the ledger ID exists in BookKeeper or tiered storage. The results will show on the terminal.

PCK provides the following functionality:

  • Detects missing ledgers in the BookKeeper or tiered storage according to the topic metadata.
  • Detects orphan ledgers in the BookKeeper or tiered storage according to the topic metadata.

Detect missing ledgers

As shown in the following illustration, PCK gets all ledgers from a topic, and then tries to find them in the storage. If a ledger is not found in storage, the missing ledger is output on the terminal.

image of detecting missing ledgers

Find orphan ledgers

To find an orphan ledger, PCK performs the following operations:

  1. Fetches the topic ledger information from the ZooKeeper.
  2. Gets the range of the ledgers from storage.
  3. Compares them to find the orphan ledger that is not used in that topic.

image of finding orphan ledgers

Previous
pulsarctl Reference