Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt

Use this file to discover all available pages before exploring further.

After preparing your external catalog, configure the compaction service to connect to it. Catalog configuration is added to the compactionScheduler.config.custom section of the PulsarBroker YAML.

Multi-Catalog Architecture

StreamNative supports configuring multiple catalogs simultaneously. Different namespaces or topics can route data to different catalogs.

Configuration Pattern

  • Iceberg catalogs: iceberg.catalog.<catalog-name>.<property>
  • Delta catalogs: delta.catalog.<catalog-name>.<property>

Default Catalog

Set the default catalog used when no topic or namespace override is specified:
custom:
  catalog.default: <catalog-name>

Catalog Resolution Priority

The catalog used for a topic is resolved in this order:
Topic property (catalog.name)
    ↓ (if not set)
Namespace property (catalog.name)
    ↓ (if not set)
Default catalog (catalog.default)
See Enable Lakehouse Integration for how to assign catalogs at namespace and topic level.

Iceberg Catalogs

Unity Catalog (Managed Iceberg Table)

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "UNITYCATALOG"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://<workspace-url>/api/2.1/unity-catalog/iceberg-rest"
      iceberg.catalog.<catalog-name>.warehouse: "<catalog-name-in-databricks>"
      iceberg.catalog.<catalog-name>.credential: "<access-token>"
      iceberg.catalog.<catalog-name>.oauth2-server-uri: "https://<workspace-url>/oidc/v1/token"
      iceberg.catalog.<catalog-name>.scope: "all-apis"
      iceberg.catalog.<catalog-name>.security: "OAUTH2"
      iceberg.catalog.<catalog-name>.vended-credentials-enabled: "true"
      iceberg.catalog.<catalog-name>.token-refresh-enabled: "true"
PropertyDescription
catalog-backendUNITYCATALOG
typerest
uriDatabricks workspace URL
warehouseCatalog name created in Databricks
credentialDatabricks access token
oauth2-server-uriDatabricks oauth2 service uri
scopeall-apis
securityOAUTH2
vended-credentials-enabledtrue
token-refresh-enabledtrue

Snowflake Horizon Catalog

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "HORIZON"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://<org>-<account>.snowflakecomputing.com/polaris/api/catalog"
      iceberg.catalog.<catalog-name>.credential: "<PAT-token>"
      iceberg.catalog.<catalog-name>.scope: "session:role:<role>"
      iceberg.catalog.<catalog-name>.warehouse: "<database-name>"
      iceberg.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation: "vended-credentials"
      iceberg.catalog.<catalog-name>.token-refresh-enabled: "true"
PropertyDescription
catalog-backendHORIZON
uriSnowflake Horizon REST API endpoint
credentialPAT token
scopeSnowflake role scope (e.g., session:role:PUBLIC)
warehouseSnowflake database name
header.X-Iceberg-Access-Delegationvended-credentials (required)
token-refresh-enabledtrue (recommended)

Snowflake Open Catalog (Polaris)

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "POLARIS"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://<account>.snowflakecomputing.com/polaris/api/catalog"
      iceberg.catalog.<catalog-name>.credential: "<client-id>:<client-secret>"
      iceberg.catalog.<catalog-name>.warehouse: "<catalog-name>"
      iceberg.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation: "vended-credentials"
      iceberg.catalog.<catalog-name>.scope: "PRINCIPAL_ROLE:ALL"
      iceberg.catalog.<catalog-name>.token-refresh-enabled: "true"
PropertyDescription
catalog-backendPOLARIS
credentialClient ID and secret in <id>:<secret> format
warehousePolaris catalog name
header.X-Iceberg-Access-Delegationvended-credentials
scopePRINCIPAL_ROLE:ALL
token-refresh-enabledtrue

AWS S3Table

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "S3TABLE"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.rest.sigv4-enabled: "true"
      iceberg.catalog.<catalog-name>.rest.signing-name: "s3tables"
      iceberg.catalog.<catalog-name>.rest.signing-region: "<region>"
      iceberg.catalog.<catalog-name>.uri: "https://s3tables.<region>.amazonaws.com/iceberg"
      iceberg.catalog.<catalog-name>.warehouse: "arn:aws:s3tables:<region>:<account>:bucket/<bucket-name>"
      iceberg.catalog.<catalog-name>.rest-metrics-reporting-enabled: "false"
PropertyDescription
catalog-backendS3TABLE
rest.sigv4-enabledtrue (required for AWS SigV4 auth)
rest.signing-names3tables
rest.signing-regionAWS region of the S3Table bucket
uriS3Tables REST endpoint (varies by region)
warehouseS3Table bucket ARN
rest-metrics-reporting-enabledfalse (S3Table does not support metric reporting)
Important: The Ursa cluster must run in the same region as the S3Table bucket.

Google BigLake

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      iceberg.catalog.<catalog-name>.catalog-backend: "BIGLAKE"
      iceberg.catalog.<catalog-name>.type: "rest"
      iceberg.catalog.<catalog-name>.uri: "https://biglake.googleapis.com/iceberg/v1/restcatalog"
      iceberg.catalog.<catalog-name>.warehouse: "gs://<bucket-name>"
      iceberg.catalog.<catalog-name>.header.x-goog-user-project: "<gcp-project-id>"
      iceberg.catalog.<catalog-name>.rest.auth.type: "org.apache.iceberg.gcp.auth.GoogleAuthManager"
      iceberg.catalog.<catalog-name>.io-impl: "org.apache.iceberg.gcp.gcs.GCSFileIO"
      iceberg.catalog.<catalog-name>.rest-metrics-reporting-enabled: "false"
      iceberg.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation: "vended-credentials"
PropertyDescription
catalog-backendBIGLAKE
warehouseGCS bucket path from BigLake catalog properties
header.x-goog-user-projectGCP project ID from BigLake catalog properties
rest.auth.typeorg.apache.iceberg.gcp.auth.GoogleAuthManager (fixed)
io-implorg.apache.iceberg.gcp.gcs.GCSFileIO (fixed)
header.X-Iceberg-Access-Delegationvended-credentials (fixed)

Delta Lake Catalogs

Unity Catalog (Delta)

compactionScheduler:
  config:
    custom:
      catalog.default: <catalog-name>
      delta.catalog.<catalog-name>.unityCatalogUri: "https://<workspace-url>"
      delta.catalog.<catalog-name>.unityCatalogName: "<catalog-name-in-databricks>"
      delta.catalog.<catalog-name>.unityCatalogToken: "<access-token>"

Authentication Options

Token-based (recommended):
delta.catalog.<catalog-name>.unityCatalogToken: "<token>"
# OR from file:
delta.catalog.<catalog-name>.unityCatalogTokenFile: "/path/to/token/file"
OAuth2 (machine-to-machine):
delta.catalog.<catalog-name>.unityCatalogClientId: "<client-id>"
delta.catalog.<catalog-name>.unityCatalogClientSecret: "<client-secret>"

BYOL (Bring Your Own Lakehouse)

Enable managed commit support for Unity Catalog:
# Delta Lake
delta.catalog.<catalog-name>.unityCatalogByolEnabled: "true"

# Iceberg
iceberg.catalog.<catalog-name>.unityCatalogByolEnabled: "true"

Without Catalog (Direct Bucket)

If you do not need an external catalog service, data can be written directly to the object storage bucket.
Required permissions: When no external catalog is used, the compaction-scheduler pod’s IAM role (AWS), service account (GCP), or workload identity (Azure) must have read, write, create, and list permissions on the target bucket. Without an external catalog, the compaction service interacts with object storage directly to create namespaces, write metadata, list existing files, and read prior snapshots. Examples of the required permissions per cloud:
CloudPermissions
AWS S3s3:GetObject, s3:PutObject, s3:DeleteObject, s3:ListBucket, s3:GetBucketLocation on the warehouse bucket and prefix
GCSstorage.buckets.get, storage.objects.get, storage.objects.list, storage.objects.create, storage.objects.delete (or the Storage Object Admin role)
Azure Blob / ADLSStorage Blob Data Contributor on the container

Iceberg (Hadoop Catalog)

The default Hadoop catalog writes Iceberg metadata and data files directly to the configured storage path. No external catalog service is required.
compactionScheduler:
  config:
    lakehouseType: iceberg
    catalog.default: <catalog-name>
    iceberg.catalog.<catalog-name>.type: "hadoop"
    iceberg.catalog.<catalog-name>.warehouse: "<bucket>/suffix"
    streamTableMode: "EXTERNAL"

Delta (No Unity Catalog)

Delta tables are written directly to the configured storage path without Unity Catalog integration.
compactionScheduler:
  config:
    catalog.default: <catalog-name>
    lakehouseType: delta
    delta.catalog.<catalog-name>.directExternalStoragePath: "<bucket>/suffix"
    streamTableMode: "EXTERNAL"

Multi-Catalog Example

Configure two catalogs (one Polaris, one S3Table) and set a default:
compactionScheduler:
  config:
    custom:
      # Default catalog
      catalog.default: polaris-prod

      # Catalog 1: Snowflake Open Catalog (Polaris)
      iceberg.catalog.polaris-prod.catalog-backend: "POLARIS"
      iceberg.catalog.polaris-prod.type: "rest"
      iceberg.catalog.polaris-prod.uri: "https://xyz.snowflakecomputing.com/polaris/api/catalog"
      iceberg.catalog.polaris-prod.credential: "<client-id>:<client-secret>"
      iceberg.catalog.polaris-prod.warehouse: "prod-catalog"

      # Catalog 2: AWS S3Table
      iceberg.catalog.s3table-analytics.catalog-backend: "S3TABLE"
      iceberg.catalog.s3table-analytics.type: "rest"
      iceberg.catalog.s3table-analytics.rest.sigv4-enabled: "true"
      iceberg.catalog.s3table-analytics.rest.signing-name: "s3tables"
      iceberg.catalog.s3table-analytics.rest.signing-region: "us-east-2"
      iceberg.catalog.s3table-analytics.uri: "https://s3tables.us-east-2.amazonaws.com/iceberg"
      iceberg.catalog.s3table-analytics.warehouse: "arn:aws:s3tables:us-east-2:123456789:bucket/analytics"
      iceberg.catalog.s3table-analytics.rest-metrics-reporting-enabled: "false"

      # Configure SDT
      streamTableMode: "EXTERNAL"
      
      # Configure to use Iceberg
      lakehouseType: "ICEBERG"
Then assign catalogs per namespace or topic:
# Use default (polaris-prod) for all topics in the namespace
pulsar-admin namespaces set-property -k catalog.name -v polaris-prod public/default

# Override for a specific topic to use S3Table
pulsar-admin topics update-properties \
  -p catalog.name=s3table-analytics \
  persistent://public/default/analytics-topic

Limitations

  • A namespace or topic can reference only one catalog at a time
  • You can assign different catalogs to different topics or namespaces
  • You cannot assign multiple catalogs to a single topic or namespace

Next Steps