Documentation Index
Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
Use this file to discover all available pages before exploring further.
This guide walks through the Kubernetes resources needed to deploy Lakehouse Table in a StreamNative Private Cloud environment.
Prerequisites
- A StreamNative Private Cloud environment with the StreamNative operator (
PulsarCoordinator) installed
- An object storage bucket (AWS S3, GCS, or Azure Blob Storage)
- An IAM role or service account with read/write permissions on the bucket
- Kubernetes namespace (e.g.,
pulsar) for deploying resources
1. ServiceAccount and IAM
Create a Kubernetes ServiceAccount that binds to your cloud IAM role. This grants the Pulsar broker and compaction service access to the object storage bucket.
AWS (EKS)
apiVersion: v1
kind: ServiceAccount
metadata:
name: private-cloud-broker
namespace: pulsar
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<your-account-id>:role/<your-role-name>
The IAM role must have the following permissions on the storage bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<your-bucket>/*",
"arn:aws:s3:::<your-bucket>"
]
}
]
}
GCS (GKE)
Coming Soon.
Azure (AKS)
Coming Soon.
2. Secrets
Create secrets for broker credentials and cloud provider configuration:
apiVersion: v1
kind: Secret
metadata:
name: pulsarcluster-private-cloud
namespace: pulsar
type: Opaque
stringData:
broker_client_credential.json: "{}"
---
apiVersion: v1
kind: Secret
metadata:
name: pulsarcluster-private-cloud-aws
namespace: pulsar
type: Opaque
stringData:
config: |
[default]
region = <your-region>
3. PulsarCoordinator
Deploy the StreamNative operator:
apiVersion: k8s.streamnative.io/v1alpha1
kind: PulsarCoordinator
metadata:
name: private-cloud
namespace: pulsar
spec:
image: streamnative/private-cloud:4.2.0.5
4. Oxia Cluster and Namespaces
Oxia provides metadata storage, leader election for the compaction service, and offset index management.
OxiaCluster
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaCluster
metadata:
labels:
k8s.streamnative.io/coordinator-name: private-cloud
name: private-cloud
namespace: pulsar
spec:
monitoringEnabled: true
image: oxia/oxia:0.16.2
imagePullPolicy: IfNotPresent
server:
replicas: 3
persistentVolumeClaimRetentionPolicy:
whenDeleted: Delete
OxiaNamespaces
Three namespaces are required:
# Broker metadata (3 shards, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
name: broker
namespace: pulsar
spec:
namespaceConfig:
name: broker
initialShardCount: 3
replicationFactor: 3
clusterRef:
name: private-cloud
namespace: pulsar
---
# Schema storage (1 shard, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
name: ursa-schema
namespace: pulsar
spec:
namespaceConfig:
name: ursa-schema
initialShardCount: 1
replicationFactor: 3
clusterRef:
name: private-cloud
namespace: pulsar
---
# WAL storage metadata (32 shards, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
name: ursa-storage
namespace: pulsar
spec:
namespaceConfig:
name: ursa-storage
initialShardCount: 32
replicationFactor: 3
clusterRef:
name: private-cloud
namespace: pulsar
| Namespace | Shards | Purpose |
|---|
broker | 3 | Broker metadata and coordination |
ursa-schema | 1 | Schema version storage |
ursa-storage | 32 | WAL storage metadata and offset index |
5. StorageCatalog
The StorageCatalog resource connects the Pulsar broker to Oxia and object storage:
apiVersion: k8s.streamnative.io/v1alpha1
kind: StorageCatalog
metadata:
name: private-cloud
namespace: pulsar
spec:
oxiaMetadataServiceUrl: oxia://private-cloud-oxia:6648/broker
storageUrl: oxia://private-cloud-oxia:6648/ursa-storage
schemaStorageUrl: oxia://private-cloud-oxia:6648/ursa-schema
backendStorageType: S3
bucket: <your-bucket-name>
region: <your-region>
prefix: private-cloud/storage
useOwnStorage: true
| Field | Description |
|---|
oxiaMetadataServiceUrl | Oxia endpoint for broker metadata |
storageUrl | Oxia endpoint for WAL storage metadata |
schemaStorageUrl | Oxia endpoint for schema storage |
backendStorageType | S3, GCS, or AZUREBLOB |
bucket | Object storage bucket name |
region | Bucket region |
prefix | Key prefix within the bucket |
useOwnStorage | true to use the configured bucket for WAL storage |
6. PulsarBroker with Compaction Service
The PulsarBroker resource includes the compaction scheduler configuration that enables lakehouse integration:
apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
name: private-cloud
namespace: pulsar
annotations:
cloud.streamnative.io/config-profile: default-config-v2
cloud.streamnative.io/open-telemetry-enabled: "true"
cloud.streamnative.io/open-telemetry-exporter: "prometheus"
cloud.streamnative.io/open-telemetry-exporter-prometheus-port: "9464"
labels:
k8s.streamnative.io/coordinator-name: private-cloud
spec:
image: streamnative/private-cloud:4.1.0.15
replicas: 1
zkServers: private-cloud-zk:2181
config:
custom:
PULSAR_PREFIX_managedLedgerOffloadAutoTriggerSizeThresholdBytes: "0"
PULSAR_PREFIX_managedLedgerOffloadThresholdInSeconds: "0"
clusterName: private-cloud
useStorageCatalog: true
managedLedgerOffloadConfig:
enabled: true
compactionScheduler:
enabled: true
replicas: 1
pod:
jvmOptions:
extraOptions:
- -Dotel.sdk.disabled=false
- -Dotel.java.disabled.resource.providers=io.opentelemetry.instrumentation.resources.ProcessResourceProvider
- -Dotel.metrics.exporter=prometheus
- -Dotel.exporter.prometheus.port=9464
config:
deployMode: StatefulSet
backendStorageType: S3
cloudStorageConfig:
bucketName: <your-bucket-name>
region: <your-region>
prefix: private-cloud/compaction
lakehouseType: iceberg
streamTableMode: EXTERNAL
function:
enabled: false
mesh:
uploadEnabled: false
pod:
serviceAccountName: private-cloud-broker
resources:
requests:
cpu: 200m
memory: 512Mi
securityContext:
runAsNonRoot: true
Key Configuration Fields
| Field | Description |
|---|
useStorageCatalog: true | Enables Ursa storage on the broker |
managedLedgerOffloadConfig.enabled: true | Enables tiered storage offloading |
compactionScheduler.enabled: true | Enables the compaction service |
compactionScheduler.replicas | Number of compaction service pods |
config.backendStorageType | Storage backend: S3, GCS, or AZUREBLOB |
config.cloudStorageConfig.bucketName | Bucket for compacted lakehouse data |
config.cloudStorageConfig.region | Bucket region |
config.cloudStorageConfig.prefix | Key prefix for compacted data |
config.lakehouseType | Table format: iceberg, delta, or delta_and_iceberg |
Compaction Service Tuning (Optional)
Additional properties can be added under compactionScheduler.config:
| Property | Description | Default |
|---|
compactedFileSizeLimit | Maximum Parquet file size before flush | 256 MB |
tailCompactDataVisibilityIntervalInSeconds | Delay before data becomes visible in lakehouse | 180s |
maxCommitIntervalInSeconds | Maximum interval between commits | 180s |
walReadRateLimitInBytesPerSecond | WAL read throughput rate limit | 50 MB/s |
lakehouseCommitMaxRetryTimes | Maximum retries for failed commits | 3 |
compactedThreadNum | Number of compaction worker threads | CPU count - 1 |
commitThreadNum | Number of commit threads | CPU count |
7. Cloud Provider Variants
GCS
Coming Soon.
Azure
Coming Soon.
Next Steps
After deploying the infrastructure, proceed to:
- Prepare Lakehouse Catalogs — Set up your external catalog service
- Configure Lakehouse Catalogs — Connect catalogs to the compaction service
- Enable Lakehouse Integration — Enable at cluster, namespace, or topic level