> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Configure storage

This document describes how to configure storage for StreamNative Platform.

## Persistent storage volumes

You can use the local [PersistentVolume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) and storage classes and default Kubernetes Storage Classes to provision persistent storage of your data.

### Local PVs and storage classes

<Note title="Note">
  If you deploy a local Kubernetes cluster, you need to configure the local PVs and storage classes for persisting data to your local storage.
</Note>

Pulsar cluster components such as the BookKeeper and ZooKeeper require the persistent storage of data. To persist data in Kubernetes, you need to use PVs. A PV contains the details of the storage that is available for use by the Pulsar cluster. A PV can be provisioned by an administrator statically or dynamically using [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/). A StorageClass provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to Quality-of-Service (QoS) levels, to backup policies, or to arbitrary policies determined by the cluster administrators.

PVs and Pods are bound by [PersistentVolumeClaim (PVC)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims). A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources.

To configure the local PVs and storage classes, follow these steps.

1. Preallocate local storage in each cluster node.

   The example creates five Solid State Drive (SSD) and Hybrid Hard Drive (HDDs) volumes respectively.

<Note title="Note">
  This code example is just for the test environment. You can configure your local storage based on your production environment.
</Note>

```
#!/bin/bash
for i in $(seq 1 5); do
  mkdir -p /mnt/ssd-bind/vol${i}
  mkdir -p /mnt/ssd/vol${i}
  mount --bind /mnt/ssd-bind/vol${i} /mnt/ssd/vol${i}
done
for i in $(seq 1 5); do
  mkdir -p /mnt/hdd-bind/vol${i}
  mkdir -p /mnt/hdd/vol${i}
  mount --bind /mnt/hdd-bind/vol${i} /mnt/hdd/vol${i}
done
```

2. Install the local volume provisioner.

<Note title="Note">
  The local volume provisioner manages the PV lifecycle for pre-allocated disks by detecting and creating PVs for each local disk on the host, and then cleaning up the disks when released. It does not support dynamic provisioning.
</Note>

a. Define a YAML file to configure the local volume provisioner.

[Here](https://github.com/streamnative/examples/tree/master/platform) is an example of the YAML file used for configuring the local volume provisioner.

b. Apply the YAML file to install the local volume provisioner.

```
kubectl apply -f /path/to/local-volume-provisioner/file.yaml
```

3. Verify that the local volume provisioner is created successfully.

   ```
   kubectl get po -n <k8s_namespace> |grep local-volume
   ```

4. Verify that all PVs are created successfully.

   ```
   kubectl get pv
   ```

5. Verify that all storage classes are created successfully.

   ```
   kubectl get storageclasses
   ```

### Kubernetes default StorageClass

If you do not provide the `volumes.data.storageClassName` in the `values.yaml` YAML file, the Pulsar operator uses the default storage class.

Use the command below to get the name of the current default storage class:

```
kubectl get sc
```

To use the Kubernetes default storage class, it is recommended to set the following properties on the default StorageClasses.

* `volumeBindingMode: WaitForFirstConsumer`
* `reclaimPolicy: Retain`
* `allowVolumeExpansion: true` (required field for the production deployments)

### Multiple volumes

Pulsar uses [Apache BookKeeper](https://bookkeeper.apache.org/) for persistent message storage. The BookKeeper server (bookie) uses [ledgers](https://bookkeeper.apache.org/docs/getting-started/concepts#ledgers) and [journals](https://bookkeeper.apache.org/docs/getting-started/concepts#journals) to manage data updates and transaction logs. BookKeeper supports concurrent writes. To persist data storage and avoid data loss, you can configure multiple volumes as well as directories of each volume for journals and ledgers in the `values.yaml` YAML file.

* Volume configurations for journals

  ```yaml theme={null}
  bookkeeper:
    volumes:
      # use a persistent volume or emptyDir
      persistence: true
      journal:
        # It determines the directory of journal data
        numVolumes: 1 # --- [1]
        numDirsPerVolume: 1 # --- [2]
  ```

  * \[1] `numVolumes`: the number of volumes supported for each journal.
  * \[2] `numDirsPerVolume`: the number of directories BookKeeper outputs its Write-Ahead Logs (WAL) to.

* Volume configurations for ledgers

  ```yaml theme={null}
  bookkeeper:
    volumes:
      # use a persistent volume or emptyDir
      persistence: true
      ledgers:
        name: ledgers
        size: 50Gi
        # It determines the directory of ledgers data
        numVolumes: 1 # --- [1]
        numDirsPerVolume: 1 # --- [2]
  ```

  * \[1] `numVolumes`: the number of volumes supported for each ledger.
  * \[2] `numDirsPerVolume`: the number of directories BookKeeper outputs ledger snapshots to.

### PVC metadata

You can add custom annotations and labels to BookKeeper PVC resources. Currently, only BookKeeper PVCs support this feature.

To configure PVC metadata for BookKeeper, add the `metadata` field under `journal` and/or `ledgers` in the `values.yaml` file:

```yaml theme={null}
bookkeeper:
  volumes:
    journal:
      metadata:
        annotations:
          example.com/annotation-key: "annotation-value"
        labels:
          example.com/label-key: "label-value"
    ledgers:
      metadata:
        annotations:
          example.com/annotation-key: "annotation-value"
        labels:
          example.com/label-key: "label-value"
```

The configured annotations and labels will be added to the PVC resources created by the BookKeeperCluster CR.

## Tiered Storage

[Tiered Storage](https://pulsar.apache.org/docs/tiered-storage-overview/) makes storing huge volumes of data in Pulsar manageable by reducing operational burden and cost. The fundamental idea is to separate data storage from data processing, allowing each to scale independently. With Tiered Storage, you can send data to cost-effective object storage, and scale brokers only when you need more compute resources.

StreamNative Platform supports the following object storage solutions for Tiered Storage:

* AWS S3
* Google Cloud Storage
* Azure Blob Storage

### Enable Tiered Storage

Starting from StreamNative Platform 1.3.0, you can enable Tiered Storage by setting `broker.offload.enabled=true`. When you enable Tiered Storage, you need to configure the type of blob storage to use and its related properties, such as the bucket / container, the region, and the credentials.

When a Pulsar cluster is deleted, StreamNative Platform does not perform a garbage collection of the Tiered Storage bucket contents. You can either wait for the set deletion interval or manually delete the objects in the Tiered Storage bucket.

To disable Tiered Storage, you can set `broker.offload.enabled=false`.

### Configure Tiered Storage for AWS S3

Before enabling Tiered Storage on Amazon Web Services (AWS) with Amazon Simple Storage Service (S3 buckets), you need to configure the following:

* Create an [AWS S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html).
* Create an IAM role in your AWS account and attach the following IAM policy to grant the necessary permissions for accessing the S3 bucket:

  ```json theme={null}
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::<bucket-name>"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::<bucket-name>/*"
                ]
            }
        ]
    }
  ```

To enable Tiered Storage for AWS S3, set the following fields in the `values.yaml` YAML file:

```yaml theme={null}
broker:
  offload:
    enabled: true
    managedLedgerMinLedgerRolloverTimeMinutes: ''
    managedLedgerMaxEntriesPerLedger: ''
    managedLedgerOffloadDriver: ''
    s3:
      enabled: true
      s3ManagedLedgerOffloadRegion: '[YOUR REGION OF S3]'
      s3ManagedLedgerOffloadBucket: '[YOUR BUCKET OF S3]'
      s3ManagedLedgerOffloadMaxBlockSizeInBytes: ''
      s3ManagedLedgerOffloadReadBufferSizeInBytes: ''
      s3ManagedLedgerOffloadServiceEndpoint: ''
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: <your-custom-role-arn>
```

This table outlines fields available for configuring Tiered Storage for AWS S3.

| Field                                                           | Description                                                                                                                                             | Default | Required or not |
| --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | --------------- |
| `broker.offload.enable`                                         | Enable Tiered Storage.                                                                                                                                  | `false` | Required        |
| `broker.offload.s3.enabled`                                     | Enable Tiered Storage for AWS S3.                                                                                                                       | `false` | Required        |
| `broker.offload.s3.s3ManagedLedgerOffloadBucket`                | The AWS S3 bucket.                                                                                                                                      | N/A     | Required        |
| `broker.offload.s3.s3ManagedLedgerOffloadRegion`                | The AWS S3 region.                                                                                                                                      | N/A     | Required        |
| `broker.offload.s3.s3ManagedLedgerOffloadMaxBlockSizeInBytes`   | The maximum size of a block that is sent when a multi-block is uploaded to AWS S3. It cannot be smaller than 5 MB.                                      | 64 MB   | Optional        |
| `broker.offload.s3.s3ManagedLedgerOffloadReadBufferSizeInBytes` | The block size for each individual read when reading data from AWS S3.                                                                                  | 1 MB    | Optional        |
| `broker.offload.s3.s3ManagedLedgerOffloadServiceEndpoint`       | An alternative AWS S3 endpoint to connect to (for test purpose).                                                                                        | N/A     | Optional        |
| `broker.serviceAccount.annotations.eks.amazonaws.com/role-arn`  | The IAM role ARN for assume role authentication. The ServiceAccount will be annotated with this role ARN to enable assume role access to the S3 bucket. | N/A     | Required        |

### Configure Tiered Storage for Google GCS

Before enabling Tiered Storage with Google Cloud Storage (GCS), you need to configure the following:

* Create a [GCS service account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating).
* Create a [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets).
* Create a Kubernetes secret to save your Google credentials with the following command. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.

  ```bash theme={null}
  kubectl -n <k8s_namespace> create secret generic \
    --from-file=<gcs_service_account_path> \
    [secret name]
  ```

To enable Tiered Storage for Google Cloud Storage, set the following fields in the `values.yaml` YAML file:

```yaml theme={null}
broker:
  offload:
    enabled: true
    managedLedgerMinLedgerRolloverTimeMinutes: ''
    managedLedgerMaxEntriesPerLedger: ''
    managedLedgerOffloadDriver:
    gcs:
      enabled: true
      gcsManagedLedgerOffloadRegion: '[YOUR REGION OF GCS]'
      gcsManagedLedgerOffloadBucket: '[YOUR BUCKET OF GCS]'
      gcsManagedLedgerOffloadMaxBlockSizeInBytes: ''
      gcsManagedLedgerOffloadReadBufferSizeInBytes: ''
      secret: '[the name of the created Kubernetes secret]'
```

This table outlines fields available for configuring Tiered Storage for Google Cloud Storage.

| Field                                                             | Description                                                                                                                      | Default                | Required or not |
| ----------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | ---------------------- | --------------- |
| `broker.offload.enable`                                           | Enable Tiered Storage.                                                                                                           | `false`                | Required        |
| `broker.offload.managedLedgerOffloadDriver`                       | The offloader driver name, which is case-insensitive.                                                                            | `google-cloud-storage` | Required        |
| `broker.offload.gcs.enabled`                                      | Enable Tiered Storage for Google GCS.                                                                                            | `false`                | Required        |
| `broker.offload.gcs.gcsManagedLedgerOffloadRegion`                | The Google Cloud Storage [bucket region](https://cloud.google.com/storage/docs/locations#location-mr).                           | N/A                    | Required        |
| `broker.offload.gcs.gcsManagedLedgerOffloadBucket`                | The Google Cloud Storage bucket.                                                                                                 | N/A                    | Required        |
| `broker.offload.gcs.gcsManagedLedgerOffloadMaxBlockSizeInBytes`   | The maximum size of a block that is sent when a multi-block is uploaded to Google Cloud Storage. It cannot be smaller than 5 MB. | 64 MB                  | Optional        |
| `broker.offload.gcs.gcsManagedLedgerOffloadReadBufferSizeInBytes` | The block size for each individual read when reading data from Google Cloud Storage.                                             | 1 MB                   | Optional        |
| `broker.offload.gcs.secret`                                       | The Kubernetes secret that stores the Google credentials.                                                                        | N/A                    | Required        |

### Configure Tiered Storage for Azure Blob Storage

Before enabling Tiered Storage with Azure Blob Storage, you need to configure the following:

* Create an [Azure storage account](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create?tabs=azure-portal) and a [storage account access key](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal).
* Create an [Azure Blob container](https://docs.microsoft.com/en-us/azure/storage/blobs/blob-containers-cli#create-a-container).
* Create a Kubernetes secret to save your Azure credentials with the command below. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.

  ```bash theme={null}
  kubectl -n <k8s_namespace> create secret generic \
    --from-literal=AZURE_STORAGE_ACCOUNT=<azure_storage_account> \
    --from-literal=AZURE_STORAGE_ACCESS_KEY=<azure_storage_access_key> \
    [secret name]
  ```

To enable Tiered Storage for Azure Blob Storage, set the following fields in the `values.yaml` YAML file:

```yaml theme={null}
broker:
  offload:
    enabled: true
    managedLedgerMinLedgerRolloverTimeMinutes: ''
    managedLedgerMaxEntriesPerLedger: ''
    managedLedgerOffloadDriver:
    azureblob:
      enabled: true
      managedLedgerOffloadBucket: '[YOUR BLOB CONTAINER]'
      managedLedgerOffloadMaxBlockSizeInBytes: ''
      managedLedgerOffloadReadBufferSizeInBytes: ''
      managedLedgerOffloadServiceEndpoint: ''
      secret: '[the name of the created Kubernetes secret]'
```

This table outlines fields available for configuring Tiered Storage for Azure Blob Storage.

| Field                                                                | Description                                                                                                                    | Default     | Required or not |
| -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | ----------- | --------------- |
| `broker.offload.enable`                                              | Enable Tiered Storage.                                                                                                         | `false`     | Required        |
| `broker.offload.managedLedgerOffloadDriver`                          | The offloader driver name, which is case-insensitive.                                                                          | `azureblob` | Required        |
| `broker.offload.azureblob.enabled`                                   | Enable Tiered Storage for Azure Blob Storage.                                                                                  | `false`     | Required        |
| `broker.offload.azureblob.managedLedgerOffloadBucket`                | The [Azure Blob container](https://docs.microsoft.com/en-us/azure/storage/blobs/blob-containers-cli#create-a-container).       | N/A         | Required        |
| `broker.offload.azureblob.managedLedgerOffloadMaxBlockSizeInBytes`   | The maximum size of a block that is sent when a multi-block is uploaded to Azure Blob Storage. It cannot be smaller than 5 MB. | 64 MB       | Optional        |
| `broker.offload.azureblob.managedLedgerOffloadReadBufferSizeInBytes` | The block size for each individual read when reading data from Azure Blob Storage.                                             | 1 MB        | Optional        |
| `broker.offload.azureblob.managedLedgerOffloadServiceEndpoint`       | An alternative Azure Blob Storage endpoint to connect to (for test purpose).                                                   | N/A         | Optional        |
| `broker.offload.azureblob.secret`                                    | The Kubernetes secret that stores the Azure credentials.                                                                       | N/A         | Required        |
