- Tools
- PCK
Tutorials
Currently, PCK provides several commands to check the consistency between the Pulsar and the storage, including the BookKeeper and the tiered storage (GCS, AWS). This document guides you how to find orphan ledgers, detect missing ledgers, delete under-replicated ledgers, and load inactive topics.
Detect missing ledgers
PCK gets all ledgers from a topic and then tries to find them in the storage. If a ledger is not found in the storage, the missing ledger is output on the terminal. You can use the detect-missing-ledger
command to detect the ledger that exists in the Pulsar metadata but not in the Bookie or the tiered storage.
Detect missing ledgers from Bookie
You can use the sn-pulsar-tools pck bookie detect-missing-ledger [options]
command to detect the missing ledgers from the Bookie.
This table lists available options for the sn-pulsar-tools pck bookie detect-missing-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-t , --tenant | The tenant of the topic that you want to check. | Optional |
-n , --namespace | The namespace of the topic that you want to check. | Optional |
-t , --topic | The topic that you want to check. | Required |
This example shows how to detect missing ledgers in the example
topic of a locally-deployed cluster.
Input
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie detect-missing-ledger -z localhost:2181 -s http://localhost:8080 -t examples
The output is similar to:
Output
Detected missing ledger in the topics [example]: []
Detect missing ledgers from tiered storage
You can use the sn-pulsar-tools pck ts detect-missing-ledger [options]
command to detect the missing ledgers from the tiered storage.
This table lists available options for the sn-pulsar-tools pck ts detect-missing-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-b , --bucket | The bucket where the data is offloaded. gs:// refers to a GCS bucket and s3a://pulsar-offload refers to a AWS S3 bucket. | Required |
-p , --parallel | The number of topics to be checked at the same time. | Optional |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-t , --tenant | The tenant of the topic that you want to check. | Optional |
-n , --namespace | The namespace of the topic that you want to check. | Optional |
-t , --topic | The topic that you want to check. | Required |
Note
If it is not a public bucket, you need to configure the credentials before executing the command.
- For GCS, you need to configure the
GOOGLE_CLOUD_PROJECT_ID
andGOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE
as the environment variables. For example, if the bucket located in the projectaffable-ray-226821
and the key file path is/pulsar/key.json
, you need to configure them as below:export GOOGLE_CLOUD_PROJECT_ID=affable-ray-226821 export GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE=/pulsar/key.json
- For AWS S3, you can configure the AWS credentials as the environment variables.
This example shows how to detect missing ledgers in the example
topic of a locally-deployed cluster.
Input
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck ts detect-missing-ledger -z localhost:2181 -b gs://pulsar-offload -s http://localhost:8080 -t example
The output is similar to:
Output
Detected missing ledger in the topic [example]: []
Find orphan ledgers
PCK fetches the topic ledger information from the ZooKeeper and gets the range of the ledgers from the storage. Then, PCK compares them to find the orphan ledger that is not used in that topic. You can use the find-orphan-ledger
command to find the orphan ledger that is not used in the Pulsar topic.
Note
- The
find-orphan-ledger
command is only available for Pulsar 2.3.0 or higher. - The
find-orphan-ledger
command is suitable for a Pulsar cluster that has a specific BookKeeper cluster. If multiple Pulsar clusters use the same BookKeeper cluster, do not use this command.
Find orphan ledgers from Bookie
You can use the sn-pulsar-tools pck bookie find-orphan-ledger [options]
command to find the missing ledgers from the Bookie.
This table lists available options for the sn-pulsar-tools pck bookie find-orphan-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-c , --concurrency | The maximum number of concurrent operations. | Optional |
-d , --delete | Delete the orphan ledgers. | Optional |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-t , --stale-time | The minimum stale time (in days) for topic ledgers. By default, it is set to one week. | Optional |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-zt , --zookeeper-timeout | The ZooKeeper session timeout. | Optional |
This example shows how to find orphan ledgers from the Bookie of a locally-deployed cluster.
Input
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie find-orphan-ledger -z localhost:2181 -s http://localhost:8080
The output is similar to:
Output
Found the orphan ledger in bookkeeper which is not referenced by any pulsar component 365 : []
365
is the orphan ledger and[]
is the metadata of the ledger.This example shows how to delete orphan ledgers from the Bookie of a locally-deployed cluster.
Input
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie find-orphan-ledger -z localhost:2181 -s http://localhost:8080 -d -t 7
The output is similar to:
Output
Found the orphan ledger in bookkeeper which is not referenced by any pulsar component 365 : [] Deleted ledger 365
Find orphan ledgers from tiered storage
You can use the sn-pulsar-tools pck ts find-orphan-ledger [options]
command to find the missing ledgers from the tiered storage.
This table outlines available options for the sn-pulsar-tools pck ts find-orphan-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-b , --bucket | The bucket where the data is offloaded. gs:// refers to a GCS bucket and s3a://pulsar-offload refers to a AWS S3 bucket. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
Note
If it is not a public bucket, you need to configure the credentials before executing the command.
- For GCS, you need to configure the
GOOGLE_CLOUD_PROJECT_ID
andGOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE
as the environment variables. For example, if the bucket located in the projectaffable-ray-226821
and the key file path is/pulsar/key.json
, you need to configure them as below:export GOOGLE_CLOUD_PROJECT_ID=affable-ray-226821 export GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE=/pulsar/key.json
- For AWS S3, you can configure the AWS credentials as the environment variables.
This example shows how to find orphan ledgers in the GCS.
Input
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck ts find-orphan-ledger -b gs://pulsar-offload -z localhost:2181
The output is similar to:
Output
Found the orphan ledgers in tiered storage which is not referenced by any topic
sink
source
function
15e155d2-cc91-4527-8570-bd99f98666d0-ledger-0
217bb7eb-1652-4343-b305-c8b415b1d02f-ledger-4
77d007b2-486f-4a06-985f-f4ef25760f26-ledger-3
e73f032e-165a-40f1-89bd-2233765d70e3-ledger-1
e89fe315-d355-4590-971d-1b605c949ff4-ledger-2
15e155d2-cc91-4527-8570-bd99f98666d0-ledger-0-index
217bb7eb-1652-4343-b305-c8b415b1d02f-ledger-4-index
77d007b2-486f-4a06-985f-f4ef25760f26-ledger-3-index
e73f032e-165a-40f1-89bd-2233765d70e3-ledger-1-index
e89fe315-d355-4590-971d-1b605c949ff4-ledger-2-index
test-packages-cloud-storage-7042eed2-1a85-40c8-b522-4fe287d5835b
test-packages-cloud-storage-de27c5db-b793-4d62-a323-dac527eafb83
test-packages-cloud-storage-e63951df-5671-4bbe-999b-98e4ecfae698
test-packages-cloud-storage-8dd0f3a5-2c43-4c76-b315-a541cc50a755
Delete under-replicated ledgers from the Bookie
You can use the sn-pulsar-tools bookie delete-underreplicate-ledgers [options]
command to delete the under-replicated ledgers from the Bookie.
This table outlines available options for the sn-pulsar-tools bookie delete-underreplicate-ledgers
command.
Option | Description | Required or optional |
---|---|---|
-conf | The path to the BookKeeper configuration file. | Required |
--dry-run | Configure whether to run the process without any modifications. | Optional |
This example shows how to delete under-replicated ledgers from the Bookie.
Input
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools bookie delete-underreplicate-ledgers --conf ./bookkeeper.conf
The output is similar to:
Output
Delete under replicate ledger [2] directly because of the metadata is not found
Ledger [1] ensembles are [[192.168.3.2:3182]], but currently have bookies [[]]. The ledger is unrecoverable, remove it from bookie metadata.
[Dry-run] Delete the ledger [1] metadata and remove it out of the under replicate ledger list
If you remove the --dry-run
option, the output is similar to:
Output
Delete under replicate ledger [2] directly because the metadata is not found
Ledger [1] ensembles are [[192.168.3.2:3182]], but currently have bookies [[]]. The ledger is unrecoverable, remove it from bookie metadata.
Delete the ledger [1] metadata and remove it out of the under replicate ledger list
Load inactive topics
You can use the sn-pulsar-tools topics load-inactive-topic [options]
command to load the inactive topics in the Pulsar broker. After loading, the Pulsar broker will check the ConsumedLedgers
in every topic. This helps delete data that has already reached the retention limit.
This table outlines available options for the sn-pulsar-tools topics load-inactive-topic
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-i , --inactive-day | The minimum inactive time (in days) for topics. By default, it is set to one week. | Optional |
-n , --namespace | The namespace of the topic that you want to check. | Required |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-t , --tenant | The tenant of the topic that you want to check. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-zt , --zookeeper-timeout | The ZooKeeper session timeout. | Optional |
This example shows how to load inactive topics for a locally-deployed cluster.
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools topics load-inactive-topic -z localhost:2181 -s http://localhost:8080 -t public -n default -i 7
The output is similar to:
Output
Load inactive topic: persistent://public/default/horizon-partition-0
Load inactive topic: persistent://public/default/horizon-partition-1
Load inactive topic: persistent://public/default/horizon-partition-2
Load inactive topic: persistent://public/default/horizon-partition-3