1. Tools
  2. PCK

Tutorials

Currently, PCK provides several commands to check the consistency between the Pulsar and the storage, including the BookKeeper and the tiered storage (GCS, AWS). This document guides you how to find orphan ledgers, detect missing ledgers, delete under-replicated ledgers, and load inactive topics.

Detect missing ledgers

PCK gets all ledgers from a topic and then tries to find them in the storage. If a ledger is not found in the storage, the missing ledger is output on the terminal. You can use the detect-missing-ledger command to detect the ledger that exists in the Pulsar metadata but not in the Bookie or the tiered storage.

Detect missing ledgers from Bookie

You can use the sn-pulsar-tools pck bookie detect-missing-ledger [options] command to detect the missing ledgers from the Bookie.

This table lists available options for the sn-pulsar-tools pck bookie detect-missing-ledger command.

OptionDescriptionRequired or optional
--auth-paramThe authentication token for connecting to the Pulsar cluster.Optional
--auth-pluginThe authentication method for connecting to the Pulsar cluster.Optional
-s, --service-urlThe service URL of the Pulsar cluster.Required
-z, --zookeeperThe ZooKeeper connection string that the cluster uses.Required
-t, --tenantThe tenant of the topic that you want to check.Optional
-n, --namespaceThe namespace of the topic that you want to check.Optional
-t, --topicThe topic that you want to check.Required

This example shows how to detect missing ledgers in the example topic of a locally-deployed cluster.

Input

/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie detect-missing-ledger -z localhost:2181 -s http://localhost:8080 -t examples

The output is similar to:

Output

Detected missing ledger in the topics [example]: []

Detect missing ledgers from tiered storage

You can use the sn-pulsar-tools pck ts detect-missing-ledger [options] command to detect the missing ledgers from the tiered storage.

This table lists available options for the sn-pulsar-tools pck ts detect-missing-ledger command.

OptionDescriptionRequired or optional
--auth-paramThe authentication token for connecting to the Pulsar cluster.Optional
--auth-pluginThe authentication method for connecting to the Pulsar cluster.Optional
-b, --bucketThe bucket where the data is offloaded. gs:// refers to a GCS bucket and s3a://pulsar-offload refers to a AWS S3 bucket.Required
-p, --parallelThe number of topics to be checked at the same time.Optional
-s, --service-urlThe service URL of the Pulsar cluster.Required
-z, --zookeeperThe ZooKeeper connection string that the cluster uses.Required
-t, --tenantThe tenant of the topic that you want to check.Optional
-n, --namespaceThe namespace of the topic that you want to check.Optional
-t, --topicThe topic that you want to check.Required

Note

If it is not a public bucket, you need to configure the credentials before executing the command.

  • For GCS, you need to configure the GOOGLE_CLOUD_PROJECT_ID and GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE as the environment variables. For example, if the bucket located in the project affable-ray-226821 and the key file path is /pulsar/key.json, you need to configure them as below:
export GOOGLE_CLOUD_PROJECT_ID=affable-ray-226821
export GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE=/pulsar/key.json

This example shows how to detect missing ledgers in the example topic of a locally-deployed cluster.

Input

/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck ts detect-missing-ledger -z localhost:2181 -b gs://pulsar-offload -s http://localhost:8080 -t example

The output is similar to:

Output

Detected missing ledger in the topic [example]: []

Find orphan ledgers

PCK fetches the topic ledger information from the ZooKeeper and gets the range of the ledgers from the storage. Then, PCK compares them to find the orphan ledger that is not used in that topic. You can use the find-orphan-ledger command to find the orphan ledger that is not used in the Pulsar topic.

Note

  • The find-orphan-ledger command is only available for Pulsar 2.3.0 or higher.
  • The find-orphan-ledger command is suitable for a Pulsar cluster that has a specific BookKeeper cluster. If multiple Pulsar clusters use the same BookKeeper cluster, do not use this command.

Find orphan ledgers from Bookie

You can use the sn-pulsar-tools pck bookie find-orphan-ledger [options] command to find the missing ledgers from the Bookie.

This table lists available options for the sn-pulsar-tools pck bookie find-orphan-ledger command.

OptionDescriptionRequired or optional
--auth-paramThe authentication token for connecting to the Pulsar cluster.Optional
--auth-pluginThe authentication method for connecting to the Pulsar cluster.Optional
-c, --concurrencyThe maximum number of concurrent operations.Optional
-d, --deleteDelete the orphan ledgers.Optional
-s, --service-urlThe service URL of the Pulsar cluster.Required
-t, --stale-timeThe minimum stale time (in days) for topic ledgers. By default, it is set to one week.Optional
-z, --zookeeperThe ZooKeeper connection string that the cluster uses.Required
-zt, --zookeeper-timeoutThe ZooKeeper session timeout.Optional
  • This example shows how to find orphan ledgers from the Bookie of a locally-deployed cluster.

    Input

    /pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie find-orphan-ledger -z localhost:2181 -s http://localhost:8080
    

    The output is similar to:

    Output

    Found the orphan ledger in bookkeeper which is not referenced by any pulsar component
    365 : []
    

    365 is the orphan ledger and [] is the metadata of the ledger.

  • This example shows how to delete orphan ledgers from the Bookie of a locally-deployed cluster.

    Input

    /pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie find-orphan-ledger -z localhost:2181 -s http://localhost:8080 -d -t 7
    

    The output is similar to:

    Output

    Found the orphan ledger in bookkeeper which is not referenced by any pulsar component
    365 : []
    Deleted ledger 365
    

Find orphan ledgers from tiered storage

You can use the sn-pulsar-tools pck ts find-orphan-ledger [options] command to find the missing ledgers from the tiered storage.

This table outlines available options for the sn-pulsar-tools pck ts find-orphan-ledger command.

OptionDescriptionRequired or optional
--auth-paramThe authentication token for connecting to the Pulsar cluster.Optional
--auth-pluginThe authentication method for connecting to the Pulsar cluster.Optional
-b, --bucketThe bucket where the data is offloaded. gs:// refers to a GCS bucket and s3a://pulsar-offload refers to a AWS S3 bucket.Required
-z, --zookeeperThe ZooKeeper connection string that the cluster uses.Required

Note

If it is not a public bucket, you need to configure the credentials before executing the command.

  • For GCS, you need to configure the GOOGLE_CLOUD_PROJECT_ID and GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE as the environment variables. For example, if the bucket located in the project affable-ray-226821 and the key file path is /pulsar/key.json, you need to configure them as below:
export GOOGLE_CLOUD_PROJECT_ID=affable-ray-226821
export GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE=/pulsar/key.json

This example shows how to find orphan ledgers in the GCS.

Input

/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck ts find-orphan-ledger -b gs://pulsar-offload -z localhost:2181

The output is similar to:

Output

Found the orphan ledgers in tiered storage which is not referenced by any topic
sink
source
function
15e155d2-cc91-4527-8570-bd99f98666d0-ledger-0
217bb7eb-1652-4343-b305-c8b415b1d02f-ledger-4
77d007b2-486f-4a06-985f-f4ef25760f26-ledger-3
e73f032e-165a-40f1-89bd-2233765d70e3-ledger-1
e89fe315-d355-4590-971d-1b605c949ff4-ledger-2
15e155d2-cc91-4527-8570-bd99f98666d0-ledger-0-index
217bb7eb-1652-4343-b305-c8b415b1d02f-ledger-4-index
77d007b2-486f-4a06-985f-f4ef25760f26-ledger-3-index
e73f032e-165a-40f1-89bd-2233765d70e3-ledger-1-index
e89fe315-d355-4590-971d-1b605c949ff4-ledger-2-index
test-packages-cloud-storage-7042eed2-1a85-40c8-b522-4fe287d5835b
test-packages-cloud-storage-de27c5db-b793-4d62-a323-dac527eafb83
test-packages-cloud-storage-e63951df-5671-4bbe-999b-98e4ecfae698
test-packages-cloud-storage-8dd0f3a5-2c43-4c76-b315-a541cc50a755

Delete under-replicated ledgers from the Bookie

You can use the sn-pulsar-tools bookie delete-underreplicate-ledgers [options] command to delete the under-replicated ledgers from the Bookie.

This table outlines available options for the sn-pulsar-tools bookie delete-underreplicate-ledgers command.

OptionDescriptionRequired or optional
-confThe path to the BookKeeper configuration file.Required
--dry-runConfigure whether to run the process without any modifications.Optional

This example shows how to delete under-replicated ledgers from the Bookie.

Input

/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools bookie delete-underreplicate-ledgers --conf ./bookkeeper.conf

The output is similar to:

Output

Delete under replicate ledger [2] directly because of the metadata is not found
Ledger [1] ensembles are [[192.168.3.2:3182]], but currently have bookies [[]]. The ledger is unrecoverable, remove it from bookie metadata.
[Dry-run] Delete the ledger [1] metadata and remove it out of the under replicate ledger list

If you remove the --dry-run option, the output is similar to:

Output

Delete under replicate ledger [2] directly because the metadata is not found
Ledger [1] ensembles are [[192.168.3.2:3182]], but currently have bookies [[]]. The ledger is unrecoverable, remove it from bookie metadata.
Delete the ledger [1] metadata and remove it out of the under replicate ledger list

Load inactive topics

You can use the sn-pulsar-tools topics load-inactive-topic [options] command to load the inactive topics in the Pulsar broker. After loading, the Pulsar broker will check the ConsumedLedgers in every topic. This helps delete data that has already reached the retention limit.

This table outlines available options for the sn-pulsar-tools topics load-inactive-topic command.

OptionDescriptionRequired or optional
--auth-paramThe authentication token for connecting to the Pulsar cluster.Optional
--auth-pluginThe authentication method for connecting to the Pulsar cluster.Optional
-i, --inactive-dayThe minimum inactive time (in days) for topics. By default, it is set to one week.Optional
-n, --namespaceThe namespace of the topic that you want to check.Required
-s, --service-urlThe service URL of the Pulsar cluster.Required
-t, --tenantThe tenant of the topic that you want to check.Required
-z, --zookeeperThe ZooKeeper connection string that the cluster uses.Required
-zt, --zookeeper-timeoutThe ZooKeeper session timeout.Optional

This example shows how to load inactive topics for a locally-deployed cluster.

/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools topics load-inactive-topic -z localhost:2181 -s http://localhost:8080 -t public -n default -i 7

The output is similar to:

Output

Load inactive topic: persistent://public/default/horizon-partition-0
Load inactive topic: persistent://public/default/horizon-partition-1
Load inactive topic: persistent://public/default/horizon-partition-2
Load inactive topic: persistent://public/default/horizon-partition-3
Previous
PCK Overview