Deploy Lakehouse Table

This guide walks through the Kubernetes resources needed to deploy Lakehouse Table in a StreamNative Private Cloud environment.

Prerequisites

A StreamNative Private Cloud environment with the StreamNative operator (PulsarCoordinator) installed
An object storage bucket (AWS S3, GCS, or Azure Blob Storage)
An IAM role or service account with read/write permissions on the bucket
Kubernetes namespace (e.g., pulsar) for deploying resources

1. ServiceAccount and IAM

Create a Kubernetes ServiceAccount that binds to your cloud IAM role. This grants the Pulsar broker and compaction service access to the object storage bucket.

AWS (EKS)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: private-cloud-broker
  namespace: pulsar
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<your-account-id>:role/<your-role-name>

The IAM role must have the following permissions on the storage bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<your-bucket>/*",
        "arn:aws:s3:::<your-bucket>"
      ]
    }
  ]
}

GCS (GKE)

Coming Soon.

Azure (AKS)

Coming Soon.

2. Secrets

Create secrets for broker credentials and cloud provider configuration:

apiVersion: v1
kind: Secret
metadata:
  name: pulsarcluster-private-cloud
  namespace: pulsar
type: Opaque
stringData:
  broker_client_credential.json: "{}"
---
apiVersion: v1
kind: Secret
metadata:
  name: pulsarcluster-private-cloud-aws
  namespace: pulsar
type: Opaque
stringData:
  config: |
    [default]
    region = <your-region>

3. PulsarCoordinator

Deploy the StreamNative operator:

apiVersion: k8s.streamnative.io/v1alpha1
kind: PulsarCoordinator
metadata:
  name: private-cloud
  namespace: pulsar
spec:
  image: streamnative/private-cloud:4.2.0.5

4. Oxia Cluster and Namespaces

Oxia provides metadata storage, leader election for the compaction service, and offset index management.

OxiaCluster

apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaCluster
metadata:
  labels:
    k8s.streamnative.io/coordinator-name: private-cloud
  name: private-cloud
  namespace: pulsar
spec:
  monitoringEnabled: true
  image: oxia/oxia:0.16.2
  imagePullPolicy: IfNotPresent
  server:
    replicas: 3
    persistentVolumeClaimRetentionPolicy:
      whenDeleted: Delete

OxiaNamespaces

Three namespaces are required:

# Broker metadata (3 shards, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
  name: broker
  namespace: pulsar
spec:
  namespaceConfig:
    name: broker
    initialShardCount: 3
    replicationFactor: 3
  clusterRef:
    name: private-cloud
    namespace: pulsar
---
# Schema storage (1 shard, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
  name: ursa-schema
  namespace: pulsar
spec:
  namespaceConfig:
    name: ursa-schema
    initialShardCount: 1
    replicationFactor: 3
  clusterRef:
    name: private-cloud
    namespace: pulsar
---
# WAL storage metadata (32 shards, replication factor 3)
apiVersion: k8s.streamnative.io/v1alpha1
kind: OxiaNamespace
metadata:
  name: ursa-storage
  namespace: pulsar
spec:
  namespaceConfig:
    name: ursa-storage
    initialShardCount: 32
    replicationFactor: 3
  clusterRef:
    name: private-cloud
    namespace: pulsar

Namespace	Shards	Purpose
`broker`	3	Broker metadata and coordination
`ursa-schema`	1	Schema version storage
`ursa-storage`	32	WAL storage metadata and offset index

5. StorageCatalog

The StorageCatalog resource connects the Pulsar broker to Oxia and object storage:

apiVersion: k8s.streamnative.io/v1alpha1
kind: StorageCatalog
metadata:
  name: private-cloud
  namespace: pulsar
spec:
  oxiaMetadataServiceUrl: oxia://private-cloud-oxia:6648/broker
  storageUrl: oxia://private-cloud-oxia:6648/ursa-storage
  schemaStorageUrl: oxia://private-cloud-oxia:6648/ursa-schema
  backendStorageType: S3
  bucket: <your-bucket-name>
  region: <your-region>
  prefix: private-cloud/storage
  useOwnStorage: true

Field	Description
`oxiaMetadataServiceUrl`	Oxia endpoint for broker metadata
`storageUrl`	Oxia endpoint for WAL storage metadata
`schemaStorageUrl`	Oxia endpoint for schema storage
`backendStorageType`	`S3`, `GCS`, or `AZUREBLOB`
`bucket`	Object storage bucket name
`region`	Bucket region
`prefix`	Key prefix within the bucket
`useOwnStorage`	`true` to use the configured bucket for WAL storage

6. PulsarBroker with Compaction Service

The PulsarBroker resource includes the compaction scheduler configuration that enables lakehouse integration:

apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
  name: private-cloud
  namespace: pulsar
  annotations:
    cloud.streamnative.io/config-profile: default-config-v2
    cloud.streamnative.io/open-telemetry-enabled: "true"
    cloud.streamnative.io/open-telemetry-exporter: "prometheus"
    cloud.streamnative.io/open-telemetry-exporter-prometheus-port: "9464"
  labels:
    k8s.streamnative.io/coordinator-name: private-cloud
spec:
  image: streamnative/private-cloud:4.1.0.15
  replicas: 1
  zkServers: private-cloud-zk:2181
  config:
    custom:
      PULSAR_PREFIX_managedLedgerOffloadAutoTriggerSizeThresholdBytes: "0"
      PULSAR_PREFIX_managedLedgerOffloadThresholdInSeconds: "0"
    clusterName: private-cloud
    useStorageCatalog: true
    managedLedgerOffloadConfig:
      enabled: true
    compactionScheduler:
      enabled: true
      replicas: 1
      pod:
        jvmOptions:
          extraOptions:
            - -Dotel.sdk.disabled=false
            - -Dotel.java.disabled.resource.providers=io.opentelemetry.instrumentation.resources.ProcessResourceProvider
            - -Dotel.metrics.exporter=prometheus
            - -Dotel.exporter.prometheus.port=9464
      config:
        deployMode: StatefulSet
        backendStorageType: S3
        cloudStorageConfig:
          bucketName: <your-bucket-name>
          region: <your-region>
          prefix: private-cloud/compaction
        lakehouseType: iceberg
        streamTableMode: EXTERNAL
    function:
      enabled: false
      mesh:
        uploadEnabled: false
  pod:
    serviceAccountName: private-cloud-broker
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
    securityContext:
      runAsNonRoot: true

Key Configuration Fields

Field	Description
`useStorageCatalog: true`	Enables Ursa storage on the broker
`managedLedgerOffloadConfig.enabled: true`	Enables tiered storage offloading
`compactionScheduler.enabled: true`	Enables the compaction service
`compactionScheduler.replicas`	Number of compaction service pods
`config.backendStorageType`	Storage backend: `S3`, `GCS`, or `AZUREBLOB`
`config.cloudStorageConfig.bucketName`	Bucket for compacted lakehouse data
`config.cloudStorageConfig.region`	Bucket region
`config.cloudStorageConfig.prefix`	Key prefix for compacted data
`config.lakehouseType`	Table format: `iceberg`, `delta`, or `delta_and_iceberg`

Compaction Service Tuning (Optional)

Additional properties can be added under compactionScheduler.config:

Property	Description	Default
`compactedFileSizeLimit`	Maximum Parquet file size before flush	256 MB
`tailCompactDataVisibilityIntervalInSeconds`	Delay before data becomes visible in lakehouse	180s
`maxCommitIntervalInSeconds`	Maximum interval between commits	180s
`walReadRateLimitInBytesPerSecond`	WAL read throughput rate limit	50 MB/s
`lakehouseCommitMaxRetryTimes`	Maximum retries for failed commits	3
`compactedThreadNum`	Number of compaction worker threads	CPU count - 1
`commitThreadNum`	Number of commit threads	CPU count

7. Cloud Provider Variants

GCS

Coming Soon.

Azure

Coming Soon.

Next Steps

After deploying the infrastructure, proceed to:

Prepare Lakehouse Catalogs — Set up your external catalog service
Register Lakehouse Catalogs — Connect catalogs to the compaction service
Enable Lakehouse Integration — Enable at cluster, namespace, or topic level

Get Started

Clusters

Data Streams

Process

Connect

Lakehouse

Governance

Pulsar Clients

MQTT Clients

Tools

Pulsar Changelogs

Prerequisites

1. ServiceAccount and IAM

AWS (EKS)

GCS (GKE)

Azure (AKS)

2. Secrets

3. PulsarCoordinator

4. Oxia Cluster and Namespaces

OxiaCluster

OxiaNamespaces

5. StorageCatalog

6. PulsarBroker with Compaction Service

Key Configuration Fields

Compaction Service Tuning (Optional)

7. Cloud Provider Variants

GCS

Azure

Next Steps

​Prerequisites

​1. ServiceAccount and IAM

​AWS (EKS)

​GCS (GKE)

​Azure (AKS)

​2. Secrets

​3. PulsarCoordinator

​4. Oxia Cluster and Namespaces

​OxiaCluster

​OxiaNamespaces

​5. StorageCatalog

​6. PulsarBroker with Compaction Service

​Key Configuration Fields

​Compaction Service Tuning (Optional)

​7. Cloud Provider Variants

​GCS

​Azure

​Next Steps

Prerequisites

1. ServiceAccount and IAM

AWS (EKS)

GCS (GKE)

Azure (AKS)

2. Secrets

3. PulsarCoordinator

4. Oxia Cluster and Namespaces

OxiaCluster

OxiaNamespaces

5. StorageCatalog

6. PulsarBroker with Compaction Service

Key Configuration Fields

Compaction Service Tuning (Optional)

7. Cloud Provider Variants

GCS

Azure

Next Steps