Configure Lakehouse Catalogs - StreamNative Documentation

After preparing your external catalog, configure the compaction service to connect to it. Catalog configuration is added to the compactionScheduler.config.custom section of the PulsarBroker YAML.

Multi-Catalog Architecture

StreamNative supports configuring multiple catalogs simultaneously. Different namespaces or topics can route data to different catalogs.

Configuration Pattern

Iceberg catalogs: iceberg.catalog.<catalog-name>.<property>
Delta catalogs: delta.catalog.<catalog-name>.<property>

Default Catalog

Set the default catalog used when no topic or namespace override is specified:

custom:
  catalog.default: <catalog-name>

Catalog Resolution Priority

The catalog used for a topic is resolved in this order:

Topic property (catalog.name)
    ↓ (if not set)
Namespace property (catalog.name)
    ↓ (if not set)
Default catalog (catalog.default)

See Enable Lakehouse Integration for how to assign catalogs at namespace and topic level.

Iceberg Catalogs

Unity Catalog (Managed Iceberg Table)

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "UNITYCATALOG"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://<workspace-url>/api/2.1/unity-catalog/iceberg-rest"
      iceberg.catalog.<catalog-name>.warehouse: "<catalog-name-in-databricks>"
      iceberg.catalog.<catalog-name>.credential: "<access-token>"
      iceberg.catalog.<catalog-name>.oauth2-server-uri: "https://<workspace-url>/oidc/v1/token"
      iceberg.catalog.<catalog-name>.scope: "all-apis"
      iceberg.catalog.<catalog-name>.security: "OAUTH2"
      iceberg.catalog.<catalog-name>.vended-credentials-enabled: "true"
      iceberg.catalog.<catalog-name>.token-refresh-enabled: "true"

Property	Description
`catalog-backend`	`UNITYCATALOG`
`type`	`rest`
`uri`	Databricks workspace URL
`warehouse`	Catalog name created in Databricks
`credential`	Databricks access token
`oauth2-server-uri`	Databricks oauth2 service uri
`scope`	`all-apis`
`security`	`OAUTH2`
`vended-credentials-enabled`	`true`
`token-refresh-enabled`	`true`

Snowflake Horizon Catalog

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "HORIZON"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://<org>-<account>.snowflakecomputing.com/polaris/api/catalog"
      iceberg.catalog.<catalog-name>.credential: "<PAT-token>"
      iceberg.catalog.<catalog-name>.scope: "session:role:<role>"
      iceberg.catalog.<catalog-name>.warehouse: "<database-name>"
      iceberg.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation: "vended-credentials"
      iceberg.catalog.<catalog-name>.token-refresh-enabled: "true"

Property	Description
`catalog-backend`	`HORIZON`
`uri`	Snowflake Horizon REST API endpoint
`credential`	PAT token
`scope`	Snowflake role scope (e.g., `session:role:PUBLIC`)
`warehouse`	Snowflake database name
`header.X-Iceberg-Access-Delegation`	`vended-credentials` (required)
`token-refresh-enabled`	`true` (recommended)

Snowflake Open Catalog (Polaris)

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "POLARIS"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://<account>.snowflakecomputing.com/polaris/api/catalog"
      iceberg.catalog.<catalog-name>.credential: "<client-id>:<client-secret>"
      iceberg.catalog.<catalog-name>.warehouse: "<catalog-name>"
      iceberg.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation: "vended-credentials"
      iceberg.catalog.<catalog-name>.scope: "PRINCIPAL_ROLE:ALL"
      iceberg.catalog.<catalog-name>.token-refresh-enabled: "true"

Property	Description
`catalog-backend`	`POLARIS`
`credential`	Client ID and secret in `<id>:<secret>` format
`warehouse`	Polaris catalog name
`header.X-Iceberg-Access-Delegation`	`vended-credentials`
`scope`	`PRINCIPAL_ROLE:ALL`
`token-refresh-enabled`	`true`

AWS S3Table

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "S3TABLE"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.rest.sigv4-enabled: "true"
      iceberg.catalog.<catalog-name>.rest.signing-name: "s3tables"
      iceberg.catalog.<catalog-name>.rest.signing-region: "<region>"
      iceberg.catalog.<catalog-name>.uri: "https://s3tables.<region>.amazonaws.com/iceberg"
      iceberg.catalog.<catalog-name>.warehouse: "arn:aws:s3tables:<region>:<account>:bucket/<bucket-name>"
      iceberg.catalog.<catalog-name>.rest-metrics-reporting-enabled: "false"

Property	Description
`catalog-backend`	`S3TABLE`
`rest.sigv4-enabled`	`true` (required for AWS SigV4 auth)
`rest.signing-name`	`s3tables`
`rest.signing-region`	AWS region of the S3Table bucket
`uri`	S3Tables REST endpoint (varies by region)
`warehouse`	S3Table bucket ARN
`rest-metrics-reporting-enabled`	`false` (S3Table does not support metric reporting)

Important: The Ursa cluster must run in the same region as the S3Table bucket.

Google BigLake

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "BIGLAKE"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://biglake.googleapis.com/iceberg/v1/restcatalog"
      iceberg.catalog.<catalog-name>.warehouse: "gs://<bucket-name>"
      iceberg.catalog.<catalog-name>.header.x-goog-user-project: "<gcp-project-id>"
      iceberg.catalog.<catalog-name>.rest.auth.type: "org.apache.iceberg.gcp.auth.GoogleAuthManager"
      iceberg.catalog.<catalog-name>.io-impl: "org.apache.iceberg.gcp.gcs.GCSFileIO"
      iceberg.catalog.<catalog-name>.rest-metrics-reporting-enabled: "false"
      iceberg.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation: "vended-credentials"

Property	Description
`catalog-backend`	`BIGLAKE`
`warehouse`	GCS bucket path from BigLake catalog properties
`header.x-goog-user-project`	GCP project ID from BigLake catalog properties
`rest.auth.type`	`org.apache.iceberg.gcp.auth.GoogleAuthManager` (fixed)
`io-impl`	`org.apache.iceberg.gcp.gcs.GCSFileIO` (fixed)
`header.X-Iceberg-Access-Delegation`	`vended-credentials` (fixed)

Delta Lake Catalogs

Unity Catalog (Delta)

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      delta.catalog.<catalog-name>.unityCatalogUri: "https://<workspace-url>"
      delta.catalog.<catalog-name>.unityCatalogName: "<catalog-name-in-databricks>"
      delta.catalog.<catalog-name>.unityCatalogToken: "<access-token>"

Authentication Options

Token-based (recommended):

delta.catalog.<catalog-name>.unityCatalogToken: "<token>"
# OR from file:
delta.catalog.<catalog-name>.unityCatalogTokenFile: "/path/to/token/file"

OAuth2 (machine-to-machine):

delta.catalog.<catalog-name>.unityCatalogClientId: "<client-id>"
delta.catalog.<catalog-name>.unityCatalogClientSecret: "<client-secret>"

BYOL (Bring Your Own Lakehouse)

Enable managed commit support for Unity Catalog:

# Delta Lake
delta.catalog.<catalog-name>.unityCatalogByolEnabled: "true"

# Iceberg
iceberg.catalog.<catalog-name>.unityCatalogByolEnabled: "true"

Without Catalog (Direct Bucket)

If you do not need an external catalog service, data can be written directly to the object storage bucket.

Required permissions: When no external catalog is used, the compaction-scheduler pod’s IAM role (AWS), service account (GCP), or workload identity (Azure) must have read, write, create, and list permissions on the target bucket. Without an external catalog, the compaction service interacts with object storage directly to create namespaces, write metadata, list existing files, and read prior snapshots. Examples of the required permissions per cloud:
Cloud Permissions
AWS S3 s3:GetObject, s3:PutObject, s3:DeleteObject, s3:ListBucket, s3:GetBucketLocation on the warehouse bucket and prefix
GCS storage.buckets.get, storage.objects.get, storage.objects.list, storage.objects.create, storage.objects.delete (or the Storage Object Admin role)
Azure Blob / ADLS Storage Blob Data Contributor on the container

Cloud	Permissions
AWS S3	`s3:GetObject`, `s3:PutObject`, `s3:DeleteObject`, `s3:ListBucket`, `s3:GetBucketLocation` on the warehouse bucket and prefix
GCS	`storage.buckets.get`, `storage.objects.get`, `storage.objects.list`, `storage.objects.create`, `storage.objects.delete` (or the `Storage Object Admin` role)
Azure Blob / ADLS	`Storage Blob Data Contributor` on the container

Iceberg (Hadoop Catalog)

The default Hadoop catalog writes Iceberg metadata and data files directly to the configured storage path. No external catalog service is required.

compactionScheduler:
  config:
    lakehouseType: iceberg
    catalog.default: <catalog-name>
    iceberg.catalog.<catalog-name>.type: "hadoop"
    iceberg.catalog.<catalog-name>.warehouse: "<bucket>/suffix"
    streamTableMode: "EXTERNAL"

Delta (No Unity Catalog)

Delta tables are written directly to the configured storage path without Unity Catalog integration.

compactionScheduler:
  config:
    catalog.default: <catalog-name>
    lakehouseType: delta
    delta.catalog.<catalog-name>.directExternalStoragePath: "<bucket>/suffix"
    streamTableMode: "EXTERNAL"

Multi-Catalog Example

Configure two catalogs (one Polaris, one S3Table) and set a default:

compactionScheduler:
  config:
    custom:
      # Default catalog
      catalog.default: polaris-prod

      # Catalog 1: Snowflake Open Catalog (Polaris)
      iceberg.catalog.polaris-prod.catalog-backend: "POLARIS"
      iceberg.catalog.polaris-prod.type: "rest"
      iceberg.catalog.polaris-prod.uri: "https://xyz.snowflakecomputing.com/polaris/api/catalog"
      iceberg.catalog.polaris-prod.credential: "<client-id>:<client-secret>"
      iceberg.catalog.polaris-prod.warehouse: "prod-catalog"

      # Catalog 2: AWS S3Table
      iceberg.catalog.s3table-analytics.catalog-backend: "S3TABLE"
      iceberg.catalog.s3table-analytics.type: "rest"
      iceberg.catalog.s3table-analytics.rest.sigv4-enabled: "true"
      iceberg.catalog.s3table-analytics.rest.signing-name: "s3tables"
      iceberg.catalog.s3table-analytics.rest.signing-region: "us-east-2"
      iceberg.catalog.s3table-analytics.uri: "https://s3tables.us-east-2.amazonaws.com/iceberg"
      iceberg.catalog.s3table-analytics.warehouse: "arn:aws:s3tables:us-east-2:123456789:bucket/analytics"
      iceberg.catalog.s3table-analytics.rest-metrics-reporting-enabled: "false"

      # Configure SDT
      streamTableMode: "EXTERNAL"
      
      # Configure to use Iceberg
      lakehouseType: "ICEBERG"

Then assign catalogs per namespace or topic:

# Use default (polaris-prod) for all topics in the namespace
pulsar-admin namespaces set-property -k catalog.name -v polaris-prod public/default

# Override for a specific topic to use S3Table
pulsar-admin topics update-properties \
  -p catalog.name=s3table-analytics \
  persistent://public/default/analytics-topic

Limitations

A namespace or topic can reference only one catalog at a time
You can assign different catalogs to different topics or namespaces
You cannot assign multiple catalogs to a single topic or namespace

Next Steps

Enable Lakehouse Integration — Enable SDT at cluster, namespace, or topic level

Documentation Index

​Multi-Catalog Architecture

​Configuration Pattern

​Default Catalog

​Catalog Resolution Priority

​Iceberg Catalogs

​Unity Catalog (Managed Iceberg Table)

​Snowflake Horizon Catalog

​Snowflake Open Catalog (Polaris)

​AWS S3Table

​Google BigLake

​Delta Lake Catalogs

​Unity Catalog (Delta)

​Authentication Options

​BYOL (Bring Your Own Lakehouse)

​Without Catalog (Direct Bucket)

​Iceberg (Hadoop Catalog)

​Delta (No Unity Catalog)

​Multi-Catalog Example

​Limitations

​Next Steps