Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks through the Kubernetes resources needed to deploy Lakehouse Table in a StreamNative Private Cloud environment.

Prerequisites

  • A StreamNative Private Cloud environment with the StreamNative operator (PulsarCoordinator) installed
  • An object storage bucket (AWS S3, GCS, or Azure Blob Storage)
  • An IAM role or service account with read/write permissions on the bucket
  • Kubernetes namespace (e.g., pulsar) for deploying resources

1. ServiceAccount and IAM

Create a Kubernetes ServiceAccount that binds to your cloud IAM role. This grants the Pulsar broker and compaction service access to the object storage bucket.

AWS (EKS)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: private-cloud-broker
  namespace: pulsar
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<your-account-id>:role/<your-role-name>
The IAM role must have the following permissions on the storage bucket:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<your-bucket>/*",
        "arn:aws:s3:::<your-bucket>"
      ]
    }
  ]
}

GCS (GKE)

Coming Soon.

Azure (AKS)

Coming Soon.

2. Secrets

Create secrets for broker credentials and cloud provider configuration:
apiVersion: v1
kind: Secret
metadata:
  name: pulsarcluster-private-cloud
  namespace: pulsar
type: Opaque
stringData:
  broker_client_credential.json: "{}"
---
apiVersion: v1
kind: Secret
metadata:
  name: pulsarcluster-private-cloud-aws
  namespace: pulsar
type: Opaque
stringData:
  config: |
    [default]
    region = <your-region>

3. PulsarCoordinator

Deploy the StreamNative operator:
apiVersion: k8s.streamnative.io/v1alpha1
kind: PulsarCoordinator
metadata:
  name: private-cloud
  namespace: pulsar
spec:
  image: streamnative/private-cloud:4.2.0.5

4. Oxia Cluster and Namespaces

Oxia provides metadata storage, leader election for the compaction service, and offset index management.

OxiaCluster

apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaCluster
metadata:
  labels:
    k8s.streamnative.io/coordinator-name: private-cloud
  name: private-cloud
  namespace: pulsar
spec:
  monitoringEnabled: true
  image: oxia/oxia:0.16.2
  imagePullPolicy: IfNotPresent
  server:
    replicas: 3
    persistentVolumeClaimRetentionPolicy:
      whenDeleted: Delete

OxiaNamespaces

Three namespaces are required:
# Broker metadata (3 shards, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
  name: broker
  namespace: pulsar
spec:
  namespaceConfig:
    name: broker
    initialShardCount: 3
    replicationFactor: 3
  clusterRef:
    name: private-cloud
    namespace: pulsar
---
# Schema storage (1 shard, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
  name: ursa-schema
  namespace: pulsar
spec:
  namespaceConfig:
    name: ursa-schema
    initialShardCount: 1
    replicationFactor: 3
  clusterRef:
    name: private-cloud
    namespace: pulsar
---
# WAL storage metadata (32 shards, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
  name: ursa-storage
  namespace: pulsar
spec:
  namespaceConfig:
    name: ursa-storage
    initialShardCount: 32
    replicationFactor: 3
  clusterRef:
    name: private-cloud
    namespace: pulsar
NamespaceShardsPurpose
broker3Broker metadata and coordination
ursa-schema1Schema version storage
ursa-storage32WAL storage metadata and offset index

5. StorageCatalog

The StorageCatalog resource connects the Pulsar broker to Oxia and object storage:
apiVersion: k8s.streamnative.io/v1alpha1
kind: StorageCatalog
metadata:
  name: private-cloud
  namespace: pulsar
spec:
  oxiaMetadataServiceUrl: oxia://private-cloud-oxia:6648/broker
  storageUrl: oxia://private-cloud-oxia:6648/ursa-storage
  schemaStorageUrl: oxia://private-cloud-oxia:6648/ursa-schema
  backendStorageType: S3
  bucket: <your-bucket-name>
  region: <your-region>
  prefix: private-cloud/storage
  useOwnStorage: true
FieldDescription
oxiaMetadataServiceUrlOxia endpoint for broker metadata
storageUrlOxia endpoint for WAL storage metadata
schemaStorageUrlOxia endpoint for schema storage
backendStorageTypeS3, GCS, or AZUREBLOB
bucketObject storage bucket name
regionBucket region
prefixKey prefix within the bucket
useOwnStoragetrue to use the configured bucket for WAL storage

6. PulsarBroker with Compaction Service

The PulsarBroker resource includes the compaction scheduler configuration that enables lakehouse integration:
apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
  name: private-cloud
  namespace: pulsar
  annotations:
    cloud.streamnative.io/config-profile: default-config-v2
    cloud.streamnative.io/open-telemetry-enabled: "true"
    cloud.streamnative.io/open-telemetry-exporter: "prometheus"
    cloud.streamnative.io/open-telemetry-exporter-prometheus-port: "9464"
  labels:
    k8s.streamnative.io/coordinator-name: private-cloud
spec:
  image: streamnative/private-cloud:4.1.0.15
  replicas: 1
  zkServers: private-cloud-zk:2181
  config:
    custom:
      PULSAR_PREFIX_managedLedgerOffloadAutoTriggerSizeThresholdBytes: "0"
      PULSAR_PREFIX_managedLedgerOffloadThresholdInSeconds: "0"
    clusterName: private-cloud
    useStorageCatalog: true
    managedLedgerOffloadConfig:
      enabled: true
    compactionScheduler:
      enabled: true
      replicas: 1
      pod:
        jvmOptions:
          extraOptions:
            - -Dotel.sdk.disabled=false
            - -Dotel.java.disabled.resource.providers=io.opentelemetry.instrumentation.resources.ProcessResourceProvider
            - -Dotel.metrics.exporter=prometheus
            - -Dotel.exporter.prometheus.port=9464
      config:
        deployMode: StatefulSet
        backendStorageType: S3
        cloudStorageConfig:
          bucketName: <your-bucket-name>
          region: <your-region>
          prefix: private-cloud/compaction
        lakehouseType: iceberg
        streamTableMode: EXTERNAL
    function:
      enabled: false
      mesh:
        uploadEnabled: false
  pod:
    serviceAccountName: private-cloud-broker
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
    securityContext:
      runAsNonRoot: true

Key Configuration Fields

FieldDescription
useStorageCatalog: trueEnables Ursa storage on the broker
managedLedgerOffloadConfig.enabled: trueEnables tiered storage offloading
compactionScheduler.enabled: trueEnables the compaction service
compactionScheduler.replicasNumber of compaction service pods
config.backendStorageTypeStorage backend: S3, GCS, or AZUREBLOB
config.cloudStorageConfig.bucketNameBucket for compacted lakehouse data
config.cloudStorageConfig.regionBucket region
config.cloudStorageConfig.prefixKey prefix for compacted data
config.lakehouseTypeTable format: iceberg, delta, or delta_and_iceberg

Compaction Service Tuning (Optional)

Additional properties can be added under compactionScheduler.config:
PropertyDescriptionDefault
compactedFileSizeLimitMaximum Parquet file size before flush256 MB
tailCompactDataVisibilityIntervalInSecondsDelay before data becomes visible in lakehouse180s
maxCommitIntervalInSecondsMaximum interval between commits180s
walReadRateLimitInBytesPerSecondWAL read throughput rate limit50 MB/s
lakehouseCommitMaxRetryTimesMaximum retries for failed commits3
compactedThreadNumNumber of compaction worker threadsCPU count - 1
commitThreadNumNumber of commit threadsCPU count

7. Cloud Provider Variants

GCS

Coming Soon.

Azure

Coming Soon.

Next Steps

After deploying the infrastructure, proceed to:
  1. Prepare Lakehouse Catalogs — Set up your external catalog service
  2. Configure Lakehouse Catalogs — Connect catalogs to the compaction service
  3. Enable Lakehouse Integration — Enable at cluster, namespace, or topic level