Tutorials
Currently, PCK provides several commands to check the consistency between the Pulsar and the storage, including the BookKeeper and the tiered storage (GCS, AWS). This document guides you how to find orphan ledgers, detect missing ledgers, delete under-replicated ledgers, and load inactive topics.
Detect missing ledgers
PCK gets all ledgers from a topic and then tries to find them in the storage. If a ledger is not found in the storage, the missing ledger is output on the terminal. You can use the detect-missing-ledger
command to detect the ledger that exists in the Pulsar metadata but not in the Bookie or the tiered storage.
Detect missing ledgers from Bookie
You can use the sn-pulsar-tools pck bookie detect-missing-ledger [options]
command to detect the missing ledgers from the Bookie.
This table lists available options for the sn-pulsar-tools pck bookie detect-missing-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-t , --tenant | The tenant of the topic that you want to check. | Optional |
-n , --namespace | The namespace of the topic that you want to check. | Optional |
-t , --topic | The topic that you want to check. | Required |
This example shows how to detect missing ledgers in the example
topic of a locally-deployed cluster.
Input
The output is similar to:
Output
Detect missing ledgers from tiered storage
You can use the sn-pulsar-tools pck ts detect-missing-ledger [options]
command to detect the missing ledgers from the tiered storage.
This table lists available options for the sn-pulsar-tools pck ts detect-missing-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-b , --bucket | The bucket where the data is offloaded. gs:// refers to a GCS bucket and s3a://pulsar-offload refers to a AWS S3 bucket. | Required |
-p , --parallel | The number of topics to be checked at the same time. | Optional |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-t , --tenant | The tenant of the topic that you want to check. | Optional |
-n , --namespace | The namespace of the topic that you want to check. | Optional |
-t , --topic | The topic that you want to check. | Required |
If it is not a public bucket, you need to configure the credentials before executing the command.
- For GCS, you need to configure the
GOOGLE_CLOUD_PROJECT_ID
andGOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE
as the environment variables. For example, if the bucket located in the projectaffable-ray-226821
and the key file path is/pulsar/key.json
, you need to configure them as below:
- For AWS S3, you can configure the AWS credentials as the environment variables.
This example shows how to detect missing ledgers in the example
topic of a locally-deployed cluster.
Input
The output is similar to:
Output
Find orphan ledgers
PCK fetches the topic ledger information from the ZooKeeper and gets the range of the ledgers from the storage. Then, PCK compares them to find the orphan ledger that is not used in that topic. You can use the find-orphan-ledger
command to find the orphan ledger that is not used in the Pulsar topic.
- The
find-orphan-ledger
command is only available for Pulsar 2.3.0 or higher. - The
find-orphan-ledger
command is suitable for a Pulsar cluster that has a specific BookKeeper cluster. If multiple Pulsar clusters use the same BookKeeper cluster, do not use this command.
Find orphan ledgers from Bookie
You can use the sn-pulsar-tools pck bookie find-orphan-ledger [options]
command to find the missing ledgers from the Bookie.
This table lists available options for the sn-pulsar-tools pck bookie find-orphan-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-c , --concurrency | The maximum number of concurrent operations. | Optional |
-d , --delete | Delete the orphan ledgers. | Optional |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-t , --stale-time | The minimum stale time (in days) for topic ledgers. By default, it is set to one week. | Optional |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-zt , --zookeeper-timeout | The ZooKeeper session timeout. | Optional |
-
This example shows how to find orphan ledgers from the Bookie of a locally-deployed cluster.
Input
The output is similar to:
Output
365
is the orphan ledger and[]
is the metadata of the ledger. -
This example shows how to delete orphan ledgers from the Bookie of a locally-deployed cluster.
Input
The output is similar to:
Output
Find orphan ledgers from tiered storage
You can use the sn-pulsar-tools pck ts find-orphan-ledger [options]
command to find the missing ledgers from the tiered storage.
This table outlines available options for the sn-pulsar-tools pck ts find-orphan-ledger
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-b , --bucket | The bucket where the data is offloaded. gs:// refers to a GCS bucket and s3a://pulsar-offload refers to a AWS S3 bucket. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
If it is not a public bucket, you need to configure the credentials before executing the command.
- For GCS, you need to configure the
GOOGLE_CLOUD_PROJECT_ID
andGOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE
as the environment variables. For example, if the bucket located in the projectaffable-ray-226821
and the key file path is/pulsar/key.json
, you need to configure them as below:
- For AWS S3, you can configure the AWS credentials as the environment variables.
This example shows how to find orphan ledgers in the GCS.
Input
The output is similar to:
Output
Delete under-replicated ledgers from the Bookie
You can use the sn-pulsar-tools bookie delete-underreplicate-ledgers [options]
command to delete the under-replicated ledgers from the Bookie.
This table outlines available options for the sn-pulsar-tools bookie delete-underreplicate-ledgers
command.
Option | Description | Required or optional |
---|---|---|
-conf | The path to the BookKeeper configuration file. | Required |
--dry-run | Configure whether to run the process without any modifications. | Optional |
This example shows how to delete under-replicated ledgers from the Bookie.
Input
The output is similar to:
Output
If you remove the --dry-run
option, the output is similar to:
Output
Load inactive topics
You can use the sn-pulsar-tools topics load-inactive-topic [options]
command to load the inactive topics in the Pulsar broker. After loading, the Pulsar broker will check the ConsumedLedgers
in every topic. This helps delete data that has already reached the retention limit.
This table outlines available options for the sn-pulsar-tools topics load-inactive-topic
command.
Option | Description | Required or optional |
---|---|---|
--auth-param | The authentication token for connecting to the Pulsar cluster. | Optional |
--auth-plugin | The authentication method for connecting to the Pulsar cluster. | Optional |
-i , --inactive-day | The minimum inactive time (in days) for topics. By default, it is set to one week. | Optional |
-n , --namespace | The namespace of the topic that you want to check. | Required |
-s , --service-url | The service URL of the Pulsar cluster. | Required |
-t , --tenant | The tenant of the topic that you want to check. | Required |
-z , --zookeeper | The ZooKeeper connection string that the cluster uses. | Required |
-zt , --zookeeper-timeout | The ZooKeeper session timeout. | Optional |
This example shows how to load inactive topics for a locally-deployed cluster.
The output is similar to:
Output