Prerequisites
- A StreamNative Private Cloud environment with the StreamNative operator (
PulsarCoordinator) installed - An object storage bucket (AWS S3, GCS, or Azure Blob Storage)
- An IAM role or service account with read/write permissions on the bucket
- Kubernetes namespace (e.g.,
pulsar) for deploying resources
1. ServiceAccount and IAM
Create a Kubernetes ServiceAccount that binds to your cloud IAM role. This grants the Pulsar broker and compaction service access to the object storage bucket.AWS (EKS)
GCS (GKE)
Coming Soon.Azure (AKS)
Coming Soon.2. Secrets
Create secrets for broker credentials and cloud provider configuration:3. PulsarCoordinator
Deploy the StreamNative operator:4. Oxia Cluster and Namespaces
Oxia provides metadata storage, leader election for the compaction service, and offset index management.OxiaCluster
OxiaNamespaces
Three namespaces are required:| Namespace | Shards | Purpose |
|---|---|---|
broker | 3 | Broker metadata and coordination |
ursa-schema | 1 | Schema version storage |
ursa-storage | 32 | WAL storage metadata and offset index |
5. StorageCatalog
The StorageCatalog resource connects the Pulsar broker to Oxia and object storage:| Field | Description |
|---|---|
oxiaMetadataServiceUrl | Oxia endpoint for broker metadata |
storageUrl | Oxia endpoint for WAL storage metadata |
schemaStorageUrl | Oxia endpoint for schema storage |
backendStorageType | S3, GCS, or AZUREBLOB |
bucket | Object storage bucket name |
region | Bucket region |
prefix | Key prefix within the bucket |
useOwnStorage | true to use the configured bucket for WAL storage |
6. PulsarBroker with Compaction Service
The PulsarBroker resource includes the compaction scheduler configuration that enables lakehouse integration:Key Configuration Fields
| Field | Description |
|---|---|
useStorageCatalog: true | Enables Ursa storage on the broker |
managedLedgerOffloadConfig.enabled: true | Enables tiered storage offloading |
compactionScheduler.enabled: true | Enables the compaction service |
compactionScheduler.replicas | Number of compaction service pods |
config.backendStorageType | Storage backend: S3, GCS, or AZUREBLOB |
config.cloudStorageConfig.bucketName | Bucket for compacted lakehouse data |
config.cloudStorageConfig.region | Bucket region |
config.cloudStorageConfig.prefix | Key prefix for compacted data |
config.lakehouseType | Table format: iceberg, delta, or delta_and_iceberg |
Compaction Service Tuning (Optional)
Additional properties can be added undercompactionScheduler.config:
| Property | Description | Default |
|---|---|---|
compactedFileSizeLimit | Maximum Parquet file size before flush | 256 MB |
tailCompactDataVisibilityIntervalInSeconds | Delay before data becomes visible in lakehouse | 180s |
maxCommitIntervalInSeconds | Maximum interval between commits | 180s |
walReadRateLimitInBytesPerSecond | WAL read throughput rate limit | 50 MB/s |
lakehouseCommitMaxRetryTimes | Maximum retries for failed commits | 3 |
compactedThreadNum | Number of compaction worker threads | CPU count - 1 |
commitThreadNum | Number of commit threads | CPU count |
7. Cloud Provider Variants
GCS
Coming Soon.Azure
Coming Soon.Next Steps
After deploying the infrastructure, proceed to:- Prepare Lakehouse Catalogs — Set up your external catalog service
- Register Lakehouse Catalogs — Connect catalogs to the compaction service
- Enable Lakehouse Integration — Enable at cluster, namespace, or topic level