- Operating StreamNative Platform
Configure storage
This document describes how to configure storage for StreamNative Platform.
Persistent storage volumes
You can use the local PersistentVolume (PV) and storage classes and default Kubernetes Storage Classes to provision persistent storage of your data.
Local PVs and storage classes
Note
If you deploy a local Kubernetes cluster, you need to configure the local PVs and storage classes for persisting data to your local storage.
Pulsar cluster components such as the BookKeeper and ZooKeeper require the persistent storage of data. To persist data in Kubernetes, you need to use PVs. A PV contains the details of the storage that is available for use by the Pulsar cluster. A PV can be provisioned by an administrator statically or dynamically using StorageClass. A StorageClass provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to Quality-of-Service (QoS) levels, to backup policies, or to arbitrary policies determined by the cluster administrators.
PVs and Pods are bound by PersistentVolumeClaim (PVC). A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources.
To configure the local PVs and storage classes, follow these steps.
Preallocate local storage in each cluster node.
The example creates five Solid State Drive (SSD) and Hybrid Hard Drive (HDDs) volumes respectively.
Note
This code example is just for the test environment. You can configure your local storage based on your production environment.
#!/bin/bash for i in $(seq 1 5); do mkdir -p /mnt/ssd-bind/vol${i} mkdir -p /mnt/ssd/vol${i} mount --bind /mnt/ssd-bind/vol${i} /mnt/ssd/vol${i} done for i in $(seq 1 5); do mkdir -p /mnt/hdd-bind/vol${i} mkdir -p /mnt/hdd/vol${i} mount --bind /mnt/hdd-bind/vol${i} /mnt/hdd/vol${i} done
Install the local volume provisioner.
Note
The local volume provisioner manages the PV lifecycle for pre-allocated disks by detecting and creating PVs for each local disk on the host, and then cleaning up the disks when released. It does not support dynamic provisioning.
a. Define a YAML file to configure the local volume provisioner.
Here is an example of the YAML file used for configuring the local volume provisioner.
b. Apply the YAML file to install the local volume provisioner.
kubectl apply -f /path/to/local-volume-provisioner/file.yaml
Verify that the local volume provisioner is created successfully.
kubectl get po -n <k8s_namespace> |grep local-volume
Verify that all PVs are created successfully.
kubectl get pv
Verify that all storage classes are created successfully.
kubectl get storageclasses
Kubernetes default StorageClass
If you do not provide the volumes.data.storageClassName
in the values.yaml
YAML file, the Pulsar operator uses the default storage class.
Use the command below to get the name of the current default storage class:
kubectl get sc
To use the Kubernetes default storage class, it is recommended to set the following properties on the default StorageClasses.
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
(required field for the production deployments)
Multiple volumes
Pulsar uses Apache BookKeeper for persistent message storage. The BookKeeper server (bookie) uses ledgers and journals to manage data updates and transaction logs. BookKeeper supports concurrent writes. To persist data storage and avoid data loss, you can configure multiple volumes as well as directories of each volume for journals and ledgers in the values.yaml
YAML file.
Volume configurations for journals
bookkeeper: volumes: # use a persistent volume or emptyDir persistence: true journal: # It determines the directory of journal data numVolumes: 1 # --- [1] numDirsPerVolume: 1 # --- [2]
- [1]
numVolumes
: the number of volumes supported for each journal. - [2]
numDirsPerVolume
: the number of directories BookKeeper outputs its Write-Ahead Logs (WAL) to.
- [1]
Volume configurations for ledgers
bookkeeper: volumes: # use a persistent volume or emptyDir persistence: true ledgers: name: ledgers size: 50Gi # It determines the directory of ledgers data numVolumes: 1 # --- [1] numDirsPerVolume: 1 # --- [2]
- [1]
numVolumes
: the number of volumes supported for each ledger. - [2]
numDirsPerVolume
: the number of directories BookKeeper outputs ledger snapshots to.
- [1]
Tiered Storage
Tiered Storage makes storing huge volumes of data in Pulsar manageable by reducing operational burden and cost. The fundamental idea is to separate data storage from data processing, allowing each to scale independently. With Tiered Storage, you can send data to cost-effective object storage, and scale brokers only when you need more compute resources.
StreamNative Platform supports the following object storage solutions for Tiered Storage:
- AWS S3
- Google Cloud Storage
- Azure Blob Storage
Enable Tiered Storage
Starting from StreamNative Platform 1.3.0, you can enable Tiered Storage by setting broker.offload.enabled=true
. When you enable Tiered Storage, you need to configure the type of blob storage to use and its related properties, such as the bucket / container, the region, and the credentials.
When a Pulsar cluster is deleted, StreamNative Platform does not perform a garbage collection of the Tiered Storage bucket contents. You can either wait for the set deletion interval or manually delete the objects in the Tiered Storage bucket.
To disable Tiered Storage, you can set broker.offload.enabled=false
.
Configure Tiered Storage for AWS S3
Before enabling Tiered Storage on Amazon Web Services (AWS) with Amazon Simple Storage Service (S3 buckets), you need to configure the following:
Generate an AWS access key and secret access key.
Create an AWS S3 bucket.
Create a Kubernetes secret to save your AWS credentials with the command below. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.
kubectl -n <k8s_namespace> create secret generic \ --from-literal=AWS_ACCESS_KEY_ID=<aws_access_key> \ --from-literal=AWS_SECRET_ACCESS_KEY=<aws_secret_key> \ [secret name]
To enable Tiered Storage for AWS S3, set the following fields in the values.yaml
YAML file:
broker:
offload:
enabled: true
managedLedgerMinLedgerRolloverTimeMinutes: ''
managedLedgerMaxEntriesPerLedger: ''
managedLedgerOffloadDriver: ''
s3:
enabled: true
s3ManagedLedgerOffloadRegion: '[YOUR REGION OF S3]'
s3ManagedLedgerOffloadBucket: '[YOUR BUCKET OF S3]'
s3ManagedLedgerOffloadMaxBlockSizeInBytes: ''
s3ManagedLedgerOffloadReadBufferSizeInBytes: ''
s3ManagedLedgerOffloadServiceEndpoint: ''
secret: '[the name of the created Kubernetes secret]'
This table outlines fields available for configuring Tiered Storage for AWS S3.
Field | Description | Default | Required or not |
---|---|---|---|
broker.offload.enable | Enable Tiered Storage. | false | Required |
broker.offload.managedLedgerMinLedgerRolloverTimeMinutes | The minimum time between ledger rollovers for a topic. It is not recommended to set this field in the production environment. | 10 | Optional |
broker.offload.managedLedgerMaxEntriesPerLedger | The Maximum number of entries to append to a ledger before triggering a rollover. it is not recommended to set this field in the production environment. | 50000 | Optional |
broker.offload.managedLedgerOffloadDriver | The offloader driver name, which is case-insensitive. There is a third driver type (S3), which is identical to AWS S3. S3 requires you to specify an endpoint URL using the s3ManagedLedgerOffloadServiceEndpoint field. This is useful if you use an S3-compatible data store other than AWS S3. | aws-s3 | Required |
broker.offload.s3.enabled | Enable Tiered Storage for AWS S3. | false | Required |
broker.offload.s3.s3ManagedLedgerOffloadRegion | The AWS S3 bucket region. Before specifying a value for this parameter, you need to perform the following operations. Otherwise, you might get an error. - Set s3ManagedLedgerOffloadServiceEndpoint , such as s3ManagedLedgerOffloadServiceEndpoint=https://s3.YOUR_REGION.amazonaws.com . - Grant GetBucketLocation permission to a user. For details about how to grant GetBucketLocation permission to a user, see bucket operations. | N/A | Optional |
broker.offload.s3.s3ManagedLedgerOffloadBucket | The AWS S3 bucket. | N/A | Required |
broker.offload.s3.s3ManagedLedgerOffloadMaxBlockSizeInBytes | The maximum size of a block that is sent when a multi-block is uploaded to AWS S3. It cannot be smaller than 5 MB. | 64 MB | Required |
broker.offload.s3.s3ManagedLedgerOffloadReadBufferSizeInBytes | The block size for each individual read when reading data from AWS S3. | 1 MB | Required |
broker.offload.s3.s3ManagedLedgerOffloadServiceEndpoint | An alternative AWS S3 endpoint to connect to (for test purpose). | N/A | Required |
broker.offload.s3.secret | The Kubernetes secret that stores the AWS credentials. | N/A | Required |
Configure Tiered Storage for Google GCS
Before enabling Tiered Storage with Google Cloud Storage (GCS), you need to configure the following:
Create a GCS service account.
Create a GCS bucket.
Create a Kubernetes secret to save your Google credentials with the following command. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.
kubectl -n <k8s_namespace> create secret generic \ --from-file=<gcs_service_account_path> \ [secret name]
To enable Tiered Storage for Google Cloud Storage, set the following fields in the values.yaml
YAML file:
broker:
offload:
enabled: true
managedLedgerMinLedgerRolloverTimeMinutes: ''
managedLedgerMaxEntriesPerLedger: ''
managedLedgerOffloadDriver:
gcs:
enabled: true
gcsManagedLedgerOffloadRegion: '[YOUR REGION OF GCS]'
gcsManagedLedgerOffloadBucket: '[YOUR BUCKET OF GCS]'
gcsManagedLedgerOffloadMaxBlockSizeInBytes: ''
gcsManagedLedgerOffloadReadBufferSizeInBytes: ''
secret: '[the name of the created Kubernetes secret]'
This table outlines fields available for configuring Tiered Storage for Google Cloud Storage.
Field | Description | Default | Required or not |
---|---|---|---|
broker.offload.enable | Enable Tiered Storage. | false | Required |
broker.offload.managedLedgerMinLedgerRolloverTimeMinutes | The minimum time between ledger rollovers for a topic. It is not recommended to set this field in the production environment. | 10 | Optional |
broker.offload.managedLedgerMaxEntriesPerLedger | The Maximum number of entries to append to a ledger before triggering a rollover. it is not recommended to set this field in the production environment. | 50000 | Optional |
broker.offload.managedLedgerOffloadDriver | The offloader driver name, which is case-insensitive. | google-cloud-storage | Required |
broker.offload.gcs.enabled | Enable Tiered Storage for Google GCS. | false | Required |
broker.offload.gcs.gcsManagedLedgerOffloadRegion | The Google Cloud Storage bucket region. | N/A | Required |
broker.offload.gcs.gcsManagedLedgerOffloadBucket | The Google Cloud Storage bucket. | N/A | Required |
broker.offload.gcs.gcsManagedLedgerOffloadMaxBlockSizeInBytes | The maximum size of a block that is sent when a multi-block is uploaded to Google Cloud Storage. It cannot be smaller than 5 MB. | 64 MB | Optional |
broker.offload.gcs.gcsManagedLedgerOffloadReadBufferSizeInBytes | The block size for each individual read when reading data from Google Cloud Storage. | 1 MB | Optional |
broker.offload.gcs.secret | The Kubernetes secret that stores the Google credentials. | N/A | Required |
Configure Tiered Storage for Azure Blob Storage
Before enabling Tiered Storage with Azure Blob Storage, you need to configure the following:
Create an Azure storage account and a storage account access key.
Create an Azure Blob container.
Create a Kubernetes secret to save your Azure credentials with the command below. When you configure Tiered Storage, you can specify the Kubernetes secret. Pulsar brokers use the credentials stored in the Kubernetes secret to access the storage container. When your storage credentials change, you need to restart the Pulsar cluster.
kubectl -n <k8s_namespace> create secret generic \ --from-literal=AZURE_STORAGE_ACCOUNT=<azure_storage_account> \ --from-literal=AZURE_STORAGE_ACCESS_KEY=<azure_storage_access_key> \ [secret name]
To enable Tiered Storage for Azure Blob Storage, set the following fields in the values.yaml
YAML file:
broker:
offload:
enabled: true
managedLedgerMinLedgerRolloverTimeMinutes: ''
managedLedgerMaxEntriesPerLedger: ''
managedLedgerOffloadDriver:
azureblob:
enabled: true
managedLedgerOffloadBucket: '[YOUR BLOB CONTAINER]'
managedLedgerOffloadMaxBlockSizeInBytes: ''
managedLedgerOffloadReadBufferSizeInBytes: ''
managedLedgerOffloadServiceEndpoint: ''
secret: '[the name of the created Kubernetes secret]'
This table outlines fields available for configuring Tiered Storage for Azure Blob Storage.
Field | Description | Default | Required or not |
---|---|---|---|
broker.offload.enable | Enable Tiered Storage. | false | Required |
broker.offload.managedLedgerMinLedgerRolloverTimeMinutes | The minimum time between ledger rollovers for a topic. It is not recommended to set this field in the production environment. | 10 | Optional |
broker.offload.managedLedgerMaxEntriesPerLedger | The Maximum number of entries to append to a ledger before triggering a rollover. It is not recommended that you set this field in the production environment. | 50000 | Optional |
broker.offload.managedLedgerOffloadDriver | The offloader driver name, which is case-insensitive. | azureblob | Required |
broker.offload.azureblob.enabled | Enable Tiered Storage for Azure Blob Storage. | false | Required |
broker.offload.azureblob.managedLedgerOffloadBucket | The Azure Blob container. | N/A | Required |
broker.offload.azureblob.managedLedgerOffloadMaxBlockSizeInBytes | The maximum size of a block that is sent when a multi-block is uploaded to Azure Blob Storage. It cannot be smaller than 5 MB. | 64 MB | Optional |
broker.offload.azureblob.managedLedgerOffloadReadBufferSizeInBytes | The block size for each individual read when reading data from Azure Blob Storage. | 1 MB | Optional |
broker.offload.azureblob.managedLedgerOffloadServiceEndpoint | An alternative Azure Blob Storage endpoint to connect to (for test purpose). | N/A | Required |
broker.offload.azureblob.secret | The Kubernetes secret that stores the Azure credentials. | N/A | Required |