Prerequisites
- A Snowflake standard account
- An Azure subscription with permissions to create storage accounts and configure trusted apps
- Access to the Snowflake Open Catalog feature
1. Create a Snowflake Open Catalog Account
The Snowflake Open Catalog console requires a dedicated Open Catalog account. From the standard Snowflake console, navigate to Admin -> Accounts and use the toggle to Create Snowflake Open Catalog Account.

- Cloud: AWS or Azure (per Polaris availability)
- Region: the region in which your storage container resides
- Edition: any




2. Collect Azure Account Information
2.1 Azure Tenant ID
In the Azure portal, search for Tenant properties and record the Tenant ID.

2.2 Storage Service Endpoint
In the Azure portal, navigate to Storage accounts, open the target storage account, and click Settings -> Endpoints. Record the Blob service primary endpoint (for example,https://<account>.blob.core.windows.net/).


2.3 Create a Container
In the storage account, navigate to Data storage -> Containers -> + Container and create a new container.
3. Create the Polaris Catalog
In the Snowflake Open Catalog console, create a new catalog.
Important: The StreamNative compaction service writes data using the AzureDFS protocol, so the Polaris catalog must use the same protocol. UseConfigure the catalog with:abfss://<container>@<account>.dfs.core.windows.net(notedfs, notblob).
- External: disabled
- Storage provider: AZURE
- Default base location:
abfss://<container>@<storage-account>.dfs.core.windows.net - Azure tenant ID: the value from step 2.1

4. Create a Trusted App in Azure
Open the catalog details and record the values ofAZURE_CONSENT_URL and AZURE_MULTI_TENANT_APP_NAME.

AZURE_CONSENT_URL in a browser and click Accept. This redirects to Snowflake and creates a trusted app in Azure. The trusted app name is the portion of AZURE_MULTI_TENANT_APP_NAME before the underscore.
Note: Provisioning the trusted app in Azure can take several minutes.
5. Grant Container Permissions to the Trusted App
In the Azure portal, navigate to the storage account’s Access Control (IAM) -> + Add -> Add role assignment.



6. Create a Connection (Service Principal)
In the Open Catalog console, create a new connection that StreamNative Ursa will use to authenticate.
- Name: any name
- Create new principal role: enabled
- Principal Role Name: any name


7. Create a Catalog Role and Grant Privileges
Navigate to Catalogs -> [your catalog] -> Roles -> + Catalog Role and create a role with the following privileges:NAMESPACE_CREATENAMESPACE_READ_PROPERTIESTABLE_CREATETABLE_WRITE_DATATABLE_READ_DATA



Catalog Information Summary
When the steps above are complete, collect the following values for the StreamNative Ursa compaction service:| Value | Description |
|---|---|
iceberg.uri | Polaris REST endpoint (e.g., https://<account>.<region>.snowflakecomputing.com/polaris/api/catalog). The format follows the URL of your Polaris console. |
iceberg.warehouse | The Polaris catalog name created in step 3 |
iceberg.credential | <client-id>:<client-secret> from step 6 |
iceberg.scope | PRINCIPAL_ROLE:ALL |
Table Maintenance
Snowflake Open Catalog (Polaris) and the Hadoop catalog do not run table maintenance on your behalf. Streaming writes from the StreamNative Ursa compaction service produce many small Parquet files and accumulate snapshot history over time, which degrades query performance and inflates storage costs. You are responsible for scheduling and running maintenance against every Iceberg table written by Ursa. Run the maintenance operations below on a regular schedule. They are provided as Apache Iceberg Spark stored procedures and can be triggered from any Spark cluster (Databricks, AWS EMR, AWS Glue, GCP Dataproc, or self-managed Spark) that has the Iceberg Spark runtime, catalog credentials, and IAM access to the warehouse bucket. Maintenance operations| Operation | Purpose | Suggested cadence |
|---|---|---|
rewrite_data_files | Compact small Parquet files into fewer, larger files. Reduces file-listing overhead and improves scan performance. | Hourly to daily, depending on ingestion rate |
expire_snapshots | Drop snapshots older than the retention window so their data and manifest files can be cleaned up. | Daily; retain at least 1–7 days so in-flight readers and time-travel queries keep working |
remove_orphan_files | Delete files in the table location that are no longer referenced by any snapshot (typically left behind by failed or partial writes). | Weekly |
rewrite_manifests | Rewrite manifest files so they align with the current partition layout. Improves query planning time. | Weekly, or after large schema or partition changes |
<catalog>. Replace <catalog>, <namespace>, and <table> with your values.
- Credentials. The principal that runs maintenance must have catalog privileges to read and write the target table (for example, the same
TABLE_READ_DATA,TABLE_WRITE_DATA,TABLE_READ_PROPERTIES, andTABLE_WRITE_PROPERTIESprivileges configured for the Ursa compaction service) and IAM access to the warehouse bucket so it can read and rewrite the underlying data files. With the Hadoop catalog there is no catalog service to authenticate against — only the bucket IAM access is required. - Concurrency. Iceberg uses optimistic concurrency control. If maintenance commits race with the Ursa compaction writer, one of them retries. Schedule heavy operations (
rewrite_data_files,rewrite_manifests) during low-write windows when possible. - Retention vs. time travel.
expire_snapshotsandremove_orphan_filespermanently delete files. Choose a retention window that exceeds the longest expected read query and your time-travel SLA. - Schedule the workload. Most teams orchestrate these procedures from Databricks Jobs, AWS EMR steps, Airflow, Dagster, or a Kubernetes
CronJob. Pick a scheduler that fits your existing operational stack. - Reference. See the Iceberg Spark procedures documentation for the full parameter list, including options for partial rewrites (
where), file-size targets, and merge-on-read delete file compaction.