> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Configure storage

StreamNative Private Cloud uses Kubernetes Storage Classes to provision persistent storage volumes for ZooKeeper and BookKeeper.

## Use default Kubernetes StorageClass

By default, StreamNative Private Cloud uses the default Kubernetes StorageClass to provision persistent volumes on Custom Resources (CRs).

Use the command below to get the name of the current default storage class.

```yaml theme={null}
kubectl get sc
```

To change the default Storage Class that is used to provision volumes, see [Change the default StorageClass](https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/) .

## Use specific Kubernetes StorageClass

You can provide a storage class to use for ZooKeeper and BookKeeper.To use a specific Kubernetes StorageClass, follow these steps.

1. Create or use a pre-defined StorageClass you want to use in your Kubernetes cluster. You need to have sufficient permissions to create and modify StorageClasses in your Kubernetes cluster if you intend to create a new StorageClass to use rather than using a pre-existing one.

2. In your ZooKeeper and BookKeeper CRs, specify the name of the StorageClass to use:

* ZooKeeperCluster

```
spec:
  persistence:
    data:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 40Gi
      # Set a pre-defined Kubernetes Storage Class
      storageClassName: <Your Storage Class name>
    dataLog:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      # Set a pre-defined Kubernetes Storage Class
      storageClassName: <Your Storage Class name>
```

* BookKeeperCluster

```
spec:
  storage:
    journal:
      numDirsPerVolume: 1
      numVolumes: 1
      volumeClaimTemplate:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
        # Set a pre-defined Kubernetes Storage Class
        storageClassName: <Your Storage Class name>
    ledger:
      numDirsPerVolume: 1
      numVolumes: 1
      volumeClaimTemplate:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 80Gi
        # Set a pre-defined Kubernetes Storage Class
        storageClassName: <Your Storage Class name>
```

## PVC metadata

You can add custom annotations and labels to BookKeeper PVC resources. Currently, only BookKeeper PVCs support this feature.

To configure PVC metadata for BookKeeper, add the `metadata` field under `journal` and/or `ledger` in the BookKeeperCluster CR:

```yaml theme={null}
spec:
  storage:
    journal:
      metadata:
        annotations:
          example.com/annotation-key: "annotation-value"
        labels:
          example.com/label-key: "label-value"
      volumeClaimTemplate:
        # ... other settings
    ledger:
      metadata:
        annotations:
          example.com/annotation-key: "annotation-value"
        labels:
          example.com/label-key: "label-value"
      volumeClaimTemplate:
        # ... other settings
```

The configured annotations and labels will be added to the PVC resources created by the BookKeeperCluster CR.

## Tiered Storage

[Tiered Storage](https://pulsar.apache.org/docs/tiered-storage-overview/) makes storing huge volumes of data in Pulsar manageable by reducing operational burden and cost. The fundamental idea is to separate data storage from data processing, allowing each to scale independently. With Tiered Storage, you can send data to cost-effective object storage, and scale brokers only when you need more compute resources.

StreamNative Private Cloud supports the following object storage solutions for Tiered Storage:

* AWS S3
* Google Cloud Storage
* Azure Blob Storage

### Enable Tiered Storage

To enable Tiered Storage, you need to configure the type of blob storage to use and its related properties, such as the bucket / container, the region, and the credentials in the `PulsarBroker` CR.

When a Pulsar cluster is deleted, StreamNative Private Cloud does not perform a garbage collection of the Tiered Storage bucket contents. You can either wait for the set deletion interval or manually delete the objects in the Tiered Storage bucket.

### Configure Tiered Storage for AWS S3

Before enabling Tiered Storage on Amazon Web Services (AWS) with Amazon Simple Storage Service (S3 buckets), you need to configure the following:

* Create an [AWS S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html).

* Create an IAM role in your AWS account and attach the following IAM policy to grant the necessary permissions for accessing the S3 bucket:

  ```json theme={null}
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::<bucket-name>"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::<bucket-name>/*"
                ]
            }
        ]
    }
  ```

* Create a Kubernetes ServiceAccount with the IAM role annotation:

  ```yaml theme={null}
  apiVersion: v1
  kind: ServiceAccount
  metadata:
    annotations:
      eks.amazonaws.com/role-arn: <your-custom-role-arn>
    name: <service-account-name>
    namespace: <namespace>
  ```

To enable Tiered Storage for AWS S3, configure the `PulsarBroker` CR as follows:

```yaml theme={null}
apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
  name: <PulsarBroker name>
  namespace: <namespace>
spec:
  image: <Pulsar image version>
  replicas: 1
  zkServers: <ZooKeeper address>
  serviceAccountName: <service-account-name>
  config:
    custom:
      managedLedgerOffloadDriver: "aws-s3"
      managedLedgerMinLedgerRolloverTimeMinutes: "10"
      managedLedgerMaxEntriesPerLedger: "50000"
      offloadersDirectory: /pulsar/offloaders
      s3ManagedLedgerOffloadRegion: '<YOUR REGION OF S3>'
      s3ManagedLedgerOffloadBucket: '<YOUR BUCKET OF S3>'
      s3ManagedLedgerOffloadServiceEndpoint: "http://s3.amazonaws.com"
      s3ManagedLedgerOffloadMaxBlockSizeInBytes: '67108864'
      s3ManagedLedgerOffloadReadBufferSizeInBytes: '1048576'
```

This table outlines fields available for configuring Tiered Storage for AWS S3.

| Field                                                       | Description                                                                                                        | Default            | Required |
| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | ------------------ | -------- |
| `config.custom.managedLedgerOffloadDriver`                  | The offloader driver name. Set to `aws-s3` for AWS S3.                                                             | N/A                | Required |
| `config.custom.managedLedgerMinLedgerRolloverTimeMinutes`   | The minimum time in minutes to wait before rolling over a ledger.                                                  | "10"               | Optional |
| `config.custom.managedLedgerMaxEntriesPerLedger`            | The maximum number of entries to append to a ledger before triggering a rollover.                                  | "50000"            | Optional |
| `config.custom.offloadersDirectory`                         | The directory where offloader implementations are stored.                                                          | /pulsar/offloaders | Optional |
| `config.custom.s3ManagedLedgerOffloadBucket`                | The AWS S3 bucket.                                                                                                 | N/A                | Required |
| `config.custom.s3ManagedLedgerOffloadRegion`                | The AWS S3 region.                                                                                                 | N/A                | Required |
| `config.custom.s3ManagedLedgerOffloadMaxBlockSizeInBytes`   | The maximum size of a block that is sent when a multi-block is uploaded to AWS S3. It cannot be smaller than 5 MB. | 64 MB              | Optional |
| `config.custom.s3ManagedLedgerOffloadReadBufferSizeInBytes` | The block size for each individual read when reading data from AWS S3.                                             | 1 MB               | Optional |
| `config.custom.s3ManagedLedgerOffloadServiceEndpoint`       | An alternative AWS S3 endpoint to connect to (for test purpose).                                                   | N/A                | Optional |
| `spec.serviceAccountName`                                   | The name of the Kubernetes ServiceAccount that is associated with the IAM role for assume role authentication.     | N/A                | Required |

### Configure Tiered Storage for Google Cloud Storage

Before enabling Tiered Storage with Google Cloud Storage (GCS), you need to configure the following:

* Create a [GCS service account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating).
* Create a [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets).
* Create a Kubernetes secret to save your Google credentials with the following command. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.

  ```bash theme={null}
  kubectl -n <k8s_namespace> create secret generic <secret-name> \
    --from-file=<gcs_service_account_path>
  ```

To enable Tiered Storage for Google Cloud Storage, configure the `PulsarBroker` CR as follows:

```yaml theme={null}
apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
  name: <PulsarBroker name>
  namespace: <namespace>
spec:
  image: <Pulsar image version>
  replicas: 1
  zkServers: <ZooKeeper address>
  config:
    custom:
      managedLedgerOffloadDriver: 'google-cloud-storage'
      managedLedgerMinLedgerRolloverTimeMinutes: "10"
      managedLedgerMaxEntriesPerLedger: "50000"
      offloadersDirectory: /pulsar/offloaders
      gcsManagedLedgerOffloadRegion: '<YOUR REGION OF GCS>'
      gcsManagedLedgerOffloadBucket: '<YOUR BUCKET OF GCS>'
      gcsManagedLedgerOffloadServiceAccountKeyFile: "/pulsar/srvaccts/gcs.json"
      gcsManagedLedgerOffloadMaxBlockSizeInBytes: '67108864'
      gcsManagedLedgerOffloadReadBufferSizeInBytes: '1048576'
  pod:
    secretRefs:
    - mountPath: /pulsar/srvaccts/gcs.json
      secretName: <secret-name>
```

This table outlines fields available for configuring Tiered Storage for Google Cloud Storage.

| Field                                                        | Description                                                                                                                      | Default                   | Required |
| ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------- | -------- |
| `config.custom.managedLedgerOffloadDriver`                   | The offloader driver name. Set to `google-cloud-storage` for GCS.                                                                | N/A                       | Required |
| `config.custom.managedLedgerMinLedgerRolloverTimeMinutes`    | The minimum time in minutes to wait before rolling over a ledger.                                                                | "10"                      | Optional |
| `config.custom.managedLedgerMaxEntriesPerLedger`             | The maximum number of entries to append to a ledger before triggering a rollover.                                                | "50000"                   | Optional |
| `config.custom.offloadersDirectory`                          | The directory where offloader implementations are stored.                                                                        | /pulsar/offloaders        | Optional |
| `config.custom.gcsManagedLedgerOffloadBucket`                | The Google Cloud Storage bucket.                                                                                                 | N/A                       | Required |
| `config.custom.gcsManagedLedgerOffloadRegion`                | The Google Cloud Storage [bucket region](https://cloud.google.com/storage/docs/locations#location-mr).                           | N/A                       | Required |
| `config.custom.gcsManagedLedgerOffloadServiceAccountKeyFile` | The path to the GCS service account key file.                                                                                    | /pulsar/srvaccts/gcs.json | Optional |
| `config.custom.gcsManagedLedgerOffloadMaxBlockSizeInBytes`   | The maximum size of a block that is sent when a multi-block is uploaded to Google Cloud Storage. It cannot be smaller than 5 MB. | 64 MB                     | Optional |
| `config.custom.gcsManagedLedgerOffloadReadBufferSizeInBytes` | The block size for each individual read when reading data from Google Cloud Storage.                                             | 1 MB                      | Optional |
| `pod.secretRefs`                                             | Mount the GCS service account JSON file to `/pulsar/srvaccts/gcs.json`.                                                          | N/A                       | Required |

### Configure Tiered Storage for Azure Blob Storage

Before enabling Tiered Storage with Azure Blob Storage, you need to configure the following:

* Create an [Azure storage account](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create?tabs=azure-portal) and a [storage account access key](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal).
* Create an [Azure Blob container](https://docs.microsoft.com/en-us/azure/storage/blobs/blob-containers-cli#create-a-container).
* Create a Kubernetes secret to save your Azure credentials with the command below. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.

  ```bash theme={null}
  kubectl -n <k8s_namespace> create secret generic <secret-name> \
    --from-literal=AZURE_STORAGE_ACCOUNT=<azure_storage_account> \
    --from-literal=AZURE_STORAGE_ACCESS_KEY=<azure_storage_access_key>
  ```

To enable Tiered Storage for Azure Blob Storage, configure the `PulsarBroker` CR as follows:

```yaml theme={null}
apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
  name: <PulsarBroker name>
  namespace: <namespace>
spec:
  image: <Pulsar image version>
  replicas: 1
  zkServers: <ZooKeeper address>
  config:
    custom:
      managedLedgerOffloadDriver: 'azureblob'
      managedLedgerMinLedgerRolloverTimeMinutes: "10"
      managedLedgerMaxEntriesPerLedger: "50000"
      offloadersDirectory: /pulsar/offloaders
      managedLedgerOffloadBucket: '<YOUR BLOB CONTAINER>'
      managedLedgerOffloadServiceEndpoint: "https://your-container.blob.core.windows.net"
      managedLedgerOffloadMaxBlockSizeInBytes: '67108864'
      managedLedgerOffloadReadBufferSizeInBytes: '1048576'
  pod:
    vars:
    - name: AZURE_STORAGE_ACCOUNT
      valueFrom:
        secretKeyRef:
          name: <secret-name>
          key: AZURE_STORAGE_ACCOUNT
    - name: AZURE_STORAGE_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: <secret-name>
          key: AZURE_STORAGE_ACCESS_KEY
```

This table outlines fields available for configuring Tiered Storage for Azure Blob Storage.

| Field                                                     | Description                                                                                                                    | Default            | Required |
| --------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | ------------------ | -------- |
| `config.custom.managedLedgerOffloadDriver`                | The offloader driver name. Set to `azureblob` for Azure Blob Storage.                                                          | N/A                | Required |
| `config.custom.managedLedgerMinLedgerRolloverTimeMinutes` | The minimum time in minutes to wait before rolling over a ledger.                                                              | "10"               | Optional |
| `config.custom.managedLedgerMaxEntriesPerLedger`          | The maximum number of entries to append to a ledger before triggering a rollover.                                              | "50000"            | Optional |
| `config.custom.offloadersDirectory`                       | The directory where offloader implementations are stored.                                                                      | /pulsar/offloaders | Optional |
| `config.custom.managedLedgerOffloadBucket`                | The [Azure Blob container](https://docs.microsoft.com/en-us/azure/storage/blobs/blob-containers-cli#create-a-container).       | N/A                | Required |
| `config.custom.managedLedgerOffloadMaxBlockSizeInBytes`   | The maximum size of a block that is sent when a multi-block is uploaded to Azure Blob Storage. It cannot be smaller than 5 MB. | 64 MB              | Optional |
| `config.custom.managedLedgerOffloadReadBufferSizeInBytes` | The block size for each individual read when reading data from Azure Blob Storage.                                             | 1 MB               | Optional |
| `config.custom.managedLedgerOffloadServiceEndpoint`       | An alternative Azure Blob Storage endpoint to connect to (for test purpose).                                                   | N/A                | Optional |
| `pod.vars`                                                | Environment variables to reference Azure credentials from the Kubernetes secret.                                               | N/A                | Required |
