> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Tutorials

Currently, PCK provides several commands to check the consistency between the Pulsar and the storage, including the BookKeeper and the tiered storage (GCS, AWS). This document guides you how to find orphan ledgers, detect missing ledgers, delete under-replicated ledgers, and load inactive topics.

## Detect missing ledgers

PCK gets all ledgers from a topic and then tries to find them in the storage. If a ledger is not found in the storage, the missing ledger is output on the terminal. You can use the `detect-missing-ledger` command to detect the ledger that exists in the Pulsar metadata but not in the Bookie or the tiered storage.

### Detect missing ledgers from Bookie

You can use the `sn-pulsar-tools pck bookie detect-missing-ledger [options]` command to detect the missing ledgers from the Bookie.

This table lists available options for the `sn-pulsar-tools pck bookie detect-missing-ledger` command.

| Option                | Description                                                     | Required or optional |
| --------------------- | --------------------------------------------------------------- | -------------------- |
| `--auth-param`        | The authentication token for connecting to the Pulsar cluster.  | Optional             |
| `--auth-plugin`       | The authentication method for connecting to the Pulsar cluster. | Optional             |
| `-s`, `--service-url` | The service URL of the Pulsar cluster.                          | Required             |
| `-z`, `--zookeeper`   | The ZooKeeper connection string that the cluster uses.          | Required             |
| `-t`, `--tenant`      | The tenant of the topic that you want to check.                 | Optional             |
| `-n`, `--namespace`   | The namespace of the topic that you want to check.              | Optional             |
| `-t`, `--topic`       | The topic that you want to check.                               | Required             |

This example shows how to detect missing ledgers in the `example` topic of a locally-deployed cluster.

**Input**

```bash theme={null}
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie detect-missing-ledger -z localhost:2181 -s http://localhost:8080 -t examples
```

The output is similar to:

**Output**

```bash theme={null}
Detected missing ledger in the topics [example]: []
```

### Detect missing ledgers from tiered storage

You can use the `sn-pulsar-tools pck ts detect-missing-ledger [options]` command to detect the missing ledgers from the tiered storage.

This table lists available options for the `sn-pulsar-tools pck ts detect-missing-ledger` command.

| Option                | Description                                                                                                                  | Required or optional |
| --------------------- | ---------------------------------------------------------------------------------------------------------------------------- | -------------------- |
| `--auth-param`        | The authentication token for connecting to the Pulsar cluster.                                                               | Optional             |
| `--auth-plugin`       | The authentication method for connecting to the Pulsar cluster.                                                              | Optional             |
| `-b`, `--bucket`      | The bucket where the data is offloaded. `gs://` refers to a GCS bucket and `s3a://pulsar-offload` refers to a AWS S3 bucket. | Required             |
| `-p`, `--parallel`    | The number of topics to be checked at the same time.                                                                         | Optional             |
| `-s`, `--service-url` | The service URL of the Pulsar cluster.                                                                                       | Required             |
| `-z`, `--zookeeper`   | The ZooKeeper connection string that the cluster uses.                                                                       | Required             |
| `-t`, `--tenant`      | The tenant of the topic that you want to check.                                                                              | Optional             |
| `-n`, `--namespace`   | The namespace of the topic that you want to check.                                                                           | Optional             |
| `-t`, `--topic`       | The topic that you want to check.                                                                                            | Required             |

<Note title="Note">
  If it is not a public bucket, you need to configure the credentials before executing the command.
</Note>

> * For GCS, you need to configure the `GOOGLE_CLOUD_PROJECT_ID` and `GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE` as the environment variables. For example, if the bucket located in the project `affable-ray-226821` and the key file path is `/pulsar/key.json`, you need to configure them as below:
>
> ```bash theme={null}
> export GOOGLE_CLOUD_PROJECT_ID=affable-ray-226821
> export GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE=/pulsar/key.json
> ```
>
> * For AWS S3, you can [configure the AWS credentials](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) as the environment variables.

This example shows how to detect missing ledgers in the `example` topic of a locally-deployed cluster.

**Input**

```bash theme={null}
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck ts detect-missing-ledger -z localhost:2181 -b gs://pulsar-offload -s http://localhost:8080 -t example
```

The output is similar to:

**Output**

```bash theme={null}
Detected missing ledger in the topic [example]: []
```

## Find orphan ledgers

PCK fetches the topic ledger information from the ZooKeeper and gets the range of the ledgers from the storage. Then, PCK compares them to find the orphan ledger that is not used in that topic. You can use the `find-orphan-ledger` command to find the orphan ledger that is not used in the Pulsar topic.

<Note title="Note">
  * The `find-orphan-ledger` command is only available for Pulsar 2.3.0 or higher.
  * The `find-orphan-ledger` command is suitable for a Pulsar cluster that has a specific BookKeeper cluster. If multiple Pulsar clusters use the same BookKeeper cluster, do not use this command.
</Note>

### Find orphan ledgers from Bookie

You can use the `sn-pulsar-tools pck bookie find-orphan-ledger [options]` command to find the missing ledgers from the Bookie.

This table lists available options for the `sn-pulsar-tools pck bookie find-orphan-ledger` command.

| Option                       | Description                                                                            | Required or optional |
| ---------------------------- | -------------------------------------------------------------------------------------- | -------------------- |
| `--auth-param`               | The authentication token for connecting to the Pulsar cluster.                         | Optional             |
| `--auth-plugin`              | The authentication method for connecting to the Pulsar cluster.                        | Optional             |
| `-c`, `--concurrency`        | The maximum number of concurrent operations.                                           | Optional             |
| `-d`, `--delete`             | Delete the orphan ledgers.                                                             | Optional             |
| `-s`, `--service-url`        | The service URL of the Pulsar cluster.                                                 | Required             |
| `-t`, `--stale-time`         | The minimum stale time (in days) for topic ledgers. By default, it is set to one week. | Optional             |
| `-z`, `--zookeeper`          | The ZooKeeper connection string that the cluster uses.                                 | Required             |
| `-zt`, `--zookeeper-timeout` | The ZooKeeper session timeout.                                                         | Optional             |

* This example shows how to find orphan ledgers from the Bookie of a locally-deployed cluster.

  **Input**

  ```bash theme={null}
  /pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie find-orphan-ledger -z localhost:2181 -s http://localhost:8080
  ```

  The output is similar to:

  **Output**

  ```bash theme={null}
  Found the orphan ledger in bookkeeper which is not referenced by any pulsar component
  365 : []
  ```

  `365` is the orphan ledger and `[]` is the metadata of the ledger.

* This example shows how to delete orphan ledgers from the Bookie of a locally-deployed cluster.

  **Input**

  ```bash theme={null}
  /pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck bookie find-orphan-ledger -z localhost:2181 -s http://localhost:8080 -d -t 7
  ```

  The output is similar to:

  **Output**

  ```bash theme={null}
  Found the orphan ledger in bookkeeper which is not referenced by any pulsar component
  365 : []
  Deleted ledger 365
  ```

### Find orphan ledgers from tiered storage

You can use the `sn-pulsar-tools pck ts find-orphan-ledger [options]` command to find the missing ledgers from the tiered storage.

This table outlines available options for the `sn-pulsar-tools pck ts find-orphan-ledger` command.

| Option              | Description                                                                                                                  | Required or optional |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------- | -------------------- |
| `--auth-param`      | The authentication token for connecting to the Pulsar cluster.                                                               | Optional             |
| `--auth-plugin`     | The authentication method for connecting to the Pulsar cluster.                                                              | Optional             |
| `-b`, `--bucket`    | The bucket where the data is offloaded. `gs://` refers to a GCS bucket and `s3a://pulsar-offload` refers to a AWS S3 bucket. | Required             |
| `-z`, `--zookeeper` | The ZooKeeper connection string that the cluster uses.                                                                       | Required             |

<Note title="Note">
  If it is not a public bucket, you need to configure the credentials before executing the command.
</Note>

> * For GCS, you need to configure the `GOOGLE_CLOUD_PROJECT_ID` and `GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE` as the environment variables. For example, if the bucket located in the project `affable-ray-226821` and the key file path is `/pulsar/key.json`, you need to configure them as below:
>
> ```bash theme={null}
> export GOOGLE_CLOUD_PROJECT_ID=affable-ray-226821
> export GOOGLE_CLOUD_SERVICE_ACCOUNT_KEYFILE=/pulsar/key.json
> ```
>
> * For AWS S3, you can [configure the AWS credentials](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) as the environment variables.

This example shows how to find orphan ledgers in the GCS.

**Input**

```bash theme={null}
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools pck ts find-orphan-ledger -b gs://pulsar-offload -z localhost:2181
```

The output is similar to:

**Output**

```bash theme={null}
Found the orphan ledgers in tiered storage which is not referenced by any topic
sink
source
function
15e155d2-cc91-4527-8570-bd99f98666d0-ledger-0
217bb7eb-1652-4343-b305-c8b415b1d02f-ledger-4
77d007b2-486f-4a06-985f-f4ef25760f26-ledger-3
e73f032e-165a-40f1-89bd-2233765d70e3-ledger-1
e89fe315-d355-4590-971d-1b605c949ff4-ledger-2
15e155d2-cc91-4527-8570-bd99f98666d0-ledger-0-index
217bb7eb-1652-4343-b305-c8b415b1d02f-ledger-4-index
77d007b2-486f-4a06-985f-f4ef25760f26-ledger-3-index
e73f032e-165a-40f1-89bd-2233765d70e3-ledger-1-index
e89fe315-d355-4590-971d-1b605c949ff4-ledger-2-index
test-packages-cloud-storage-7042eed2-1a85-40c8-b522-4fe287d5835b
test-packages-cloud-storage-de27c5db-b793-4d62-a323-dac527eafb83
test-packages-cloud-storage-e63951df-5671-4bbe-999b-98e4ecfae698
test-packages-cloud-storage-8dd0f3a5-2c43-4c76-b315-a541cc50a755
```

## Delete under-replicated ledgers from the Bookie

You can use the `sn-pulsar-tools bookie delete-underreplicate-ledgers [options]` command to delete the under-replicated ledgers from the Bookie.

This table outlines available options for the `sn-pulsar-tools bookie delete-underreplicate-ledgers` command.

| Option      | Description                                                     | Required or optional |
| ----------- | --------------------------------------------------------------- | -------------------- |
| `-conf`     | The path to the BookKeeper configuration file.                  | Required             |
| `--dry-run` | Configure whether to run the process without any modifications. | Optional             |

This example shows how to delete under-replicated ledgers from the Bookie.

**Input**

```bash theme={null}
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools bookie delete-underreplicate-ledgers --conf ./bookkeeper.conf
```

The output is similar to:

**Output**

```bash theme={null}
Delete under replicate ledger [2] directly because of the metadata is not found
Ledger [1] ensembles are [[192.168.3.2:3182]], but currently have bookies [[]]. The ledger is unrecoverable, remove it from bookie metadata.
[Dry-run] Delete the ledger [1] metadata and remove it out of the under replicate ledger list
```

If you remove the `--dry-run` option, the output is similar to:

**Output**

```bash theme={null}
Delete under replicate ledger [2] directly because the metadata is not found
Ledger [1] ensembles are [[192.168.3.2:3182]], but currently have bookies [[]]. The ledger is unrecoverable, remove it from bookie metadata.
Delete the ledger [1] metadata and remove it out of the under replicate ledger list
```

## Load inactive topics

You can use the `sn-pulsar-tools topics load-inactive-topic [options]` command to load the inactive topics in the Pulsar broker. After loading, the Pulsar broker will check the `ConsumedLedgers` in every topic. This helps delete data that has already reached the retention limit.

This table outlines available options for the `sn-pulsar-tools topics load-inactive-topic` command.

| Option                       | Description                                                                        | Required or optional |
| ---------------------------- | ---------------------------------------------------------------------------------- | -------------------- |
| `--auth-param`               | The authentication token for connecting to the Pulsar cluster.                     | Optional             |
| `--auth-plugin`              | The authentication method for connecting to the Pulsar cluster.                    | Optional             |
| `-i`, `--inactive-day`       | The minimum inactive time (in days) for topics. By default, it is set to one week. | Optional             |
| `-n`, `--namespace`          | The namespace of the topic that you want to check.                                 | Required             |
| `-s`, `--service-url`        | The service URL of the Pulsar cluster.                                             | Required             |
| `-t`, `--tenant`             | The tenant of the topic that you want to check.                                    | Required             |
| `-z`, `--zookeeper`          | The ZooKeeper connection string that the cluster uses.                             | Required             |
| `-zt`, `--zookeeper-timeout` | The ZooKeeper session timeout.                                                     | Optional             |

This example shows how to load inactive topics for a locally-deployed cluster.

```bash theme={null}
/pulsar/sn-pulsar-tools/bin/sn-pulsar-tools topics load-inactive-topic -z localhost:2181 -s http://localhost:8080 -t public -n default -i 7
```

The output is similar to:

**Output**

```bash theme={null}
Load inactive topic: persistent://public/default/horizon-partition-0
Load inactive topic: persistent://public/default/horizon-partition-1
Load inactive topic: persistent://public/default/horizon-partition-2
Load inactive topic: persistent://public/default/horizon-partition-3
```
