Configure storage

This document describes how to configure storage for StreamNative Platform.

Persistent storage volumes

You can use the local PersistentVolume (PV) and storage classes and default Kubernetes Storage Classes to provision persistent storage of your data.

Local PVs and storage classes

If you deploy a local Kubernetes cluster, you need to configure the local PVs and storage classes for persisting data to your local storage.

Pulsar cluster components such as the BookKeeper and ZooKeeper require the persistent storage of data. To persist data in Kubernetes, you need to use PVs. A PV contains the details of the storage that is available for use by the Pulsar cluster. A PV can be provisioned by an administrator statically or dynamically using StorageClass. A StorageClass provides a way for administrators to describe the “classes” of storage they offer. Different classes might map to Quality-of-Service (QoS) levels, to backup policies, or to arbitrary policies determined by the cluster administrators.

PVs and Pods are bound by PersistentVolumeClaim (PVC). A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources.

To configure the local PVs and storage classes, follow these steps.

Preallocate local storage in each cluster node.

The example creates five Solid State Drive (SSD) and Hybrid Hard Drive (HDDs) volumes respectively.

This code example is just for the test environment. You can configure your local storage based on your production environment.

#!/bin/bash
for i in $(seq 1 5); do
  mkdir -p /mnt/ssd-bind/vol${i}
  mkdir -p /mnt/ssd/vol${i}
  mount --bind /mnt/ssd-bind/vol${i} /mnt/ssd/vol${i}
done
for i in $(seq 1 5); do
  mkdir -p /mnt/hdd-bind/vol${i}
  mkdir -p /mnt/hdd/vol${i}
  mount --bind /mnt/hdd-bind/vol${i} /mnt/hdd/vol${i}
done

Install the local volume provisioner.

The local volume provisioner manages the PV lifecycle for pre-allocated disks by detecting and creating PVs for each local disk on the host, and then cleaning up the disks when released. It does not support dynamic provisioning.

a. Define a YAML file to configure the local volume provisioner.

Here is an example of the YAML file used for configuring the local volume provisioner.

b. Apply the YAML file to install the local volume provisioner.

kubectl apply -f /path/to/local-volume-provisioner/file.yaml

Verify that the local volume provisioner is created successfully.
```
kubectl get po -n <k8s_namespace> |grep local-volume
```
Verify that all PVs are created successfully.
```
kubectl get pv
```
Verify that all storage classes are created successfully.
```
kubectl get storageclasses
```

Kubernetes default StorageClass

If you do not provide the volumes.data.storageClassName in the values.yaml YAML file, the Pulsar operator uses the default storage class.

Use the command below to get the name of the current default storage class:

kubectl get sc

To use the Kubernetes default storage class, it is recommended to set the following properties on the default StorageClasses.

volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true (required field for the production deployments)

Multiple volumes

Pulsar uses Apache BookKeeper for persistent message storage. The BookKeeper server (bookie) uses ledgers and journals to manage data updates and transaction logs. BookKeeper supports concurrent writes. To persist data storage and avoid data loss, you can configure multiple volumes as well as directories of each volume for journals and ledgers in the values.yaml YAML file.

Volume configurations for journals

bookkeeper:
  volumes:
    # use a persistent volume or emptyDir
    persistence: true
    journal:
      # It determines the directory of journal data
      numVolumes: 1 # --- [1]
      numDirsPerVolume: 1 # --- [2]

[1] numVolumes: the number of volumes supported for each journal.
[2] numDirsPerVolume: the number of directories BookKeeper outputs its Write-Ahead Logs (WAL) to.

Volume configurations for ledgers

bookkeeper:
  volumes:
    # use a persistent volume or emptyDir
    persistence: true
    ledgers:
      name: ledgers
      size: 50Gi
      # It determines the directory of ledgers data
      numVolumes: 1 # --- [1]
      numDirsPerVolume: 1 # --- [2]

[1] numVolumes: the number of volumes supported for each ledger.
[2] numDirsPerVolume: the number of directories BookKeeper outputs ledger snapshots to.

Tiered Storage

Tiered Storage makes storing huge volumes of data in Pulsar manageable by reducing operational burden and cost. The fundamental idea is to separate data storage from data processing, allowing each to scale independently. With Tiered Storage, you can send data to cost-effective object storage, and scale brokers only when you need more compute resources.

StreamNative Platform supports the following object storage solutions for Tiered Storage:

AWS S3
Google Cloud Storage
Azure Blob Storage

Enable Tiered Storage

Starting from StreamNative Platform 1.3.0, you can enable Tiered Storage by setting broker.offload.enabled=true. When you enable Tiered Storage, you need to configure the type of blob storage to use and its related properties, such as the bucket / container, the region, and the credentials.

When a Pulsar cluster is deleted, StreamNative Platform does not perform a garbage collection of the Tiered Storage bucket contents. You can either wait for the set deletion interval or manually delete the objects in the Tiered Storage bucket.

To disable Tiered Storage, you can set broker.offload.enabled=false.

Configure Tiered Storage for AWS S3

Before enabling Tiered Storage on Amazon Web Services (AWS) with Amazon Simple Storage Service (S3 buckets), you need to configure the following:

Generate an AWS access key and secret access key.
Create an AWS S3 bucket.
Create a Kubernetes secret to save your AWS credentials with the command below. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.
```
kubectl -n <k8s_namespace> create secret generic \
  --from-literal=AWS_ACCESS_KEY_ID=<aws_access_key> \
  --from-literal=AWS_SECRET_ACCESS_KEY=<aws_secret_key> \
  [secret name]
```

To enable Tiered Storage for AWS S3, set the following fields in the values.yaml YAML file:

broker:
  offload:
    enabled: true
    managedLedgerMinLedgerRolloverTimeMinutes: ''
    managedLedgerMaxEntriesPerLedger: ''
    managedLedgerOffloadDriver: ''
    s3:
      enabled: true
      s3ManagedLedgerOffloadRegion: '[YOUR REGION OF S3]'
      s3ManagedLedgerOffloadBucket: '[YOUR BUCKET OF S3]'
      s3ManagedLedgerOffloadMaxBlockSizeInBytes: ''
      s3ManagedLedgerOffloadReadBufferSizeInBytes: ''
      s3ManagedLedgerOffloadServiceEndpoint: ''
      secret: '[the name of the created Kubernetes secret]'

This table outlines fields available for configuring Tiered Storage for AWS S3.

Field	Description	Default	Required or not
`broker.offload.enable`	Enable Tiered Storage.	`false`	Required

Configure Tiered Storage for Google GCS

Before enabling Tiered Storage with Google Cloud Storage (GCS), you need to configure the following:

Create a GCS service account.
Create a GCS bucket.
Create a Kubernetes secret to save your Google credentials with the following command. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.
```
kubectl -n <k8s_namespace> create secret generic \
  --from-file=<gcs_service_account_path> \
  [secret name]
```

To enable Tiered Storage for Google Cloud Storage, set the following fields in the values.yaml YAML file:

broker:
  offload:
    enabled: true
    managedLedgerMinLedgerRolloverTimeMinutes: ''
    managedLedgerMaxEntriesPerLedger: ''
    managedLedgerOffloadDriver:
    gcs:
      enabled: true
      gcsManagedLedgerOffloadRegion: '[YOUR REGION OF GCS]'
      gcsManagedLedgerOffloadBucket: '[YOUR BUCKET OF GCS]'
      gcsManagedLedgerOffloadMaxBlockSizeInBytes: ''
      gcsManagedLedgerOffloadReadBufferSizeInBytes: ''
      secret: '[the name of the created Kubernetes secret]'

This table outlines fields available for configuring Tiered Storage for Google Cloud Storage.

Field	Description	Default	Required or not
`broker.offload.enable`	Enable Tiered Storage.	`false`	Required

Configure Tiered Storage for Azure Blob Storage

Before enabling Tiered Storage with Azure Blob Storage, you need to configure the following:

Create an Azure storage account and a storage account access key.
Create an Azure Blob container.
Create a Kubernetes secret to save your Azure credentials with the command below. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.
```
kubectl -n <k8s_namespace> create secret generic \
  --from-literal=AZURE_STORAGE_ACCOUNT=<azure_storage_account> \
  --from-literal=AZURE_STORAGE_ACCESS_KEY=<azure_storage_access_key> \
  [secret name]
```

To enable Tiered Storage for Azure Blob Storage, set the following fields in the values.yaml YAML file:

broker:
  offload:
    enabled: true
    managedLedgerMinLedgerRolloverTimeMinutes: ''
    managedLedgerMaxEntriesPerLedger: ''
    managedLedgerOffloadDriver:
    azureblob:
      enabled: true
      managedLedgerOffloadBucket: '[YOUR BLOB CONTAINER]'
      managedLedgerOffloadMaxBlockSizeInBytes: ''
      managedLedgerOffloadReadBufferSizeInBytes: ''
      managedLedgerOffloadServiceEndpoint: ''
      secret: '[the name of the created Kubernetes secret]'

This table outlines fields available for configuring Tiered Storage for Azure Blob Storage.

Field	Description	Default	Required or not
`broker.offload.enable`	Enable Tiered Storage.	`false`	Required

Overview

Quick Start

Streamnative Console

Clients

Connectors

Monitor

References

Operating Streamnative Platform

Tools

Persistent storage volumes

Local PVs and storage classes

Kubernetes default StorageClass

Multiple volumes

Tiered Storage

Enable Tiered Storage

Configure Tiered Storage for AWS S3

Configure Tiered Storage for Google GCS

Configure Tiered Storage for Azure Blob Storage

Overview

Quick Start

Streamnative Console

Clients

Connectors

Monitor

References

Operating Streamnative Platform

Tools

​Persistent storage volumes

​Local PVs and storage classes

​Kubernetes default StorageClass

​Multiple volumes

​Tiered Storage

​Enable Tiered Storage

​Configure Tiered Storage for AWS S3

​Configure Tiered Storage for Google GCS

​Configure Tiered Storage for Azure Blob Storage

Persistent storage volumes

Local PVs and storage classes

Kubernetes default StorageClass

Multiple volumes

Tiered Storage

Enable Tiered Storage

Configure Tiered Storage for AWS S3

Configure Tiered Storage for Google GCS

Configure Tiered Storage for Azure Blob Storage