This guide offers a detailed walkthrough for integrating StreamNative Cloud with Snowflake Open Catalog. It covers essential aspects such as configuring authentication, storage buckets, catalogs, and other key components. Snowflake Open Catalog Integration is available with StreamNative BYOC Ursa clusters, which can be deployed into your AWS, GCP, or Azure cloud account. Specific directions are included for each cloud provider where applicable. By following this guide, you will enable seamless interaction between StreamNative Cloud and Snowflake Open Catalog.
Before initiating the integration of Snowflake Open Catalog with StreamNative Cloud, please ensure the following steps are completed. You can also watch this video to learn more about Preparing Snowflake Open Catalog AWS Example
Create a Snowflake AI Data Cloud account. The homepage of a Snowflake AI Data Cloud account will look as follows.
To access the Snowflake Open Catalog console, a specialized Open Catalog account must be created. This account type is specifically designed for managing Open Catalog features and functionality.
Enter Admin → Accounts → Toggle → Create Snowflake Open Catalog Account
Configure the Snowflake Open Catalog
The Snowflake Open Catalog, storage bucket, and StreamNative BYOC Ursa cluster should be in the same cloud provider and region. Snowflake Open Catalog doesn’t support cross-region buckets. To avoid costs associated with cross-region traffic, we highly recommend your storage bucket and StreamNative BYOC Ursa cluster are in the same region.
Edition: any
Next, input a Snowflake Open Catalog Account Name, User Name,Password, and Email. This will create a new user for use specifically with the Snowflake Open Catalog Account.
Click Create Account. You will see the following if account creation is successful. We highly recommend taking a screenshot of this confirmation message. This Account Locator URL will be used in later steps.
Click the Account URL, then sign into your open catalog account. You will enter the Snowflake Open Catalog console.
If you need the Account URL of your Snowflake Open Catalog Account in the future, navigate to Admin → Accounts → … → Manage URLs of your Snowflake Account. This page is available in your Snowflake AI Data Cloud Account. The Locator column, in combination with the Region, can be used to construct the Account Locator URL. The Account Locator URL will be needed when configuring the StreamNative BYOC Ursa cluster.
URI: Account Locator URL when creating Snowflake Open Catalog. Append ‘/polaris/api/catalog’ to the URI. The StreamNative UI requires you use the Account Locator URL instead of the Account URL. The Account Locator URL will be in the following format.
Choose bucket location and grant access to StreamNative Cloud. You have two choices to setup a storage bucket.
The Snowflake Open Catalog, storage bucket, and StreamNative BYOC Ursa cluster should be in the same cloud provider and region.** Snowflake Open Catalog doesn’t support cross-region buckets. To avoid costs associated with cross-region traffic, we highly recommend your storage bucket and StreamNative BYOC Ursa cluster are in the same region.
Option 1: Use your own bucket (recommended)
You need to create your own storage bucket, with the option to create a bucket path. When using your own bucket, the resulting path you will use for creation of the Snowflake Open Catalog will be as follows. The compaction folder will be created automatically by the StreamNative cluster.
AWS
s3://<your-bucket-name>/<your-bucket-path>/compaction
GCP
gs://<your-bucket-name>/<your-bucket-path>/compaction
Azure : The option to use your own bucket is currently not available on Azure
StreamNative will require access to this storage bucket. To grant access, execute the following Terraform module based on your cloud provider.
Terraform module for AWS
Terraform module for GCP
You can find your organization name in the StreamNative console, as shown below:
Before executing the Terraform module, you must provide your console access to your cloud provider. These variables are used to grant your console access to the cloud service provider account where the storage bucket bucket is located.
AWS
Learn more about how to configure AWS CLI here.
GCP
The following commands are used to authenticate using a user account at the following link.
Run the Terraform module
Option 2: Use StreamNative provided bucket
This process involves deploying the StreamNative BYOC Cloud Connection, Cloud Environment, and beginning the process of deploying the StreamNative BYOC Ursa Cluster to obtain the cluster id. StreamNative will automatically assign the necessary permissions to this bucket. The option to use StreamNative provided bucket is available on AWS, GCP and Azure.
To proceed, you will need to first complete the steps for granting vendor access, creating a Cloud Connection, and setting up the Cloud Environment. Next, begin the process of deploying the StreamNative BYOC Ursa Cluster to obtain the cluster id. Step 1 of Create StreamNative BYOC Ursa Cluster below includes directions on obtaining the cluster id from the Lakehouse Storage Configuration page.
When using a StreamNative-provided bucket, the resulting path you will use for creation of the Snowflake Open Catalog will be as follows. The cloud environment id will be created during the deployment of the Cloud Environment. The cluster id is assigned when starting the cluster creation process in the StreamNative Console and is found on the Lakehouse Storage Configuration page.
AWS
s3://<your-cloud-environement-id>/<your-cluster-id>/compaction
GCP
gs://<your-cloud-environment-id-tiered-storage>/<your-cluster-id>/compaction
Azure
gs://<your-cloud-environment-id-tiered-storage>/<your-cluster-id>/compaction
AWS Create IAM policy and role for Snowflake Open Catalog Access.
In the AWS console, enter Access management → Policies → Create policy
Then choose the JSON format. Enter the rule as follows, replacing <your-bucket-name>
and <your-bucket-path>
Click Next
Provide a policy name and click Create policy.
Create IAM Role
In the AWS console, enter Access management → Roles → Create role
Click Next
Select the policy created in the previous step. Then click Next
Input a role name and click Create role.
View the detailed role information and record the ARN
This policy and role are used for Snowflake Open Catalog access to the s3 bucket.
GCP
Create a role for Snowflake Open Catalog bucket access.
Navigate to IAM → Roles → Create role
Provide a role title and ID (e.g. streamnative_pulsar_open_catalog).
Provide the following permissions:
Click Create. This role with be used by Snowflake Open Catalog to access the bucket.
Azure
We need to obtain the Azure Tenant ID before continuing to the next step
In Azure console, search for Tenant properties and select Tenant properties
Record the Tenant ID
Create Snowflake Open Catalog
AWS
User provided bucket:
s3://<your-bucket-name>/<your-bucket-path>/compaction
StreamNative provided bucket:
s3://<your-cloud-environement-id>/<your-cluster-id>/compaction
Then click Create, you will see the catalog streamnative created
View the catalog details and capture the value of the IAM user arn. The Snowflake Open Catalog will use this arn to access our AWS bucket.
Trust the Snowflake Open Catalog Iam user arn
In the AWS console, enter Access management → Roles, search for the role we created before.
Then click Trust relationships → edit trust policy
Change the value of Principal:AWS to the Snowflake Open Catalog IAM user arn
Then click Update policy and the Snowflake Open Catalog can access the bucket.
GCP
gs://<your-bucket-name>/<your-bucket-path>/compaction
gs://<your-cloud-environment-id>/<your-cluster-id>/compaction
Additional locations: not configured
Then click Create, you will see the catalog pulsar created.
Select the catalog and Catalog Details. Here we need to record the value of the GCP_SERVICE_ACCOUNT. The Snowflake Open Catalog will use this account to access our storage bucket.
Navigate to Cloud Storage -> Buckets and select the root of the storage bucket.
<your-bucket-name>
<your-cloud-environment-id-tiered-storage>
Select Permissions-> Grant access
New principals: paste the GCP_SERVICE_ACCOUNT from the catalog
Role: paste the name of the role created in the previous step (e.g. streamnative_pulsar_open_catalog)
Snowflake Open Catalog now has access to the GCP storage bucket.
Azure
Navigate to Catalog Details and copy the AZURE_CONSENT_URL and AZURE_MULTI_TENANT_APP_NAME.
Navigate to the AZURE_CONSENT_URL in a browser and click Aceept. You will be redirected to the Snowflake homepage. A trusted app named after the string prefix of the AZURE_MULTI_TENANT_APP_NAME was created in Azure and will be used in the next step.
Navigate to the storage account of the StreamNative BYOC Ursa Cluster. In our example, the SN Bucket Location is as follows:
abfss://tiered-storage@snpmtscbbad75227e51b50.dfs.core.windows.net/o-mj3r8-c-u6y2cpq-ursa/compaction
The Storage account is therefore: snpmtscbbad75227e51b50
Navigate to Access Control (IAM) -> + Add -> Add role assignment
Search for Storage Blob Data and select Storage Blob Data Contributor. Click Next.
Click + Select members. Paste in the string prefix of the AZURE_MULTI_TENANT_APP_NAME and select the Application. Click Select.
Click Review + assign twice.
You can verify permissions by selecting Role assignments and searching for the application.
Our engine needs a connection to access the Snowflake Open Catalog, so we need to create one. We will later reuse this connection for Snowflake to access Snowflake Open Catalog.
Then click Create, and you will see a pane. Record the Client ID and Client Secret for this connection as <CLIENT ID>:<SECRET>
. Our engine needs it to access the Snowflake Open Catalog.
We now have a Service Connection called streamnativeconnection linked to the Principal Role streamnativeprincipal.
Create a Snowflake Catalog Role
Enter catalogs → detail catalog streamnative (or pulsar) → Roles → + Catalog Role
Name: streamnativeopencatalog
Privileges:
Click Create.
Then click Grant to Principal Role
Then click Grant
The catalog role streamnative_open_catalog_role now has the 10 required permissions on catalog streamnative (or pulsar).
We will resuse the connection when connecting Snowflake to Snowflake Open Catalog.
To proceed, you will need to first complete the steps for granting vendor access, creating a Cloud Connection, and setting up the Cloud Environment.Then you can begin the process of deploying the StreamNative BYOC Ursa Cluster. You can also watch this video to learn more about deploying the StreamNative BYOC Ursa Cluster (AWS Example).
In this section we create and set up a cluster in StreamNative Cloud. Login to StreamNative Cloud and click on ‘Create an instance and deploy cluster’
Click on Deploy BYOC
Enter Instance name, select your Cloud Connection, select URSA Engine and click on Cluster Location
Enter Cluster Name, select your Cloud Environment, select Multi AZ and click on Lakehouse Storage Configuration
To configure Storage Location there are two options
Option 1: Select Use Your Own Bucket to choose your own storage bucket by entering the following details
AWS
GCP
Option 2: Select Use Existing BYOC Bucket to choose the bucket created by StreamNative (AWS, GCP or Azure)
The UI will present you with the SN Bucket Location in this format to be used when creating the Snowflake Open Catalog.
AWS
s3://<your-cloud-environement-id>/<your-cluster-id>/compaction
e.g.
s3://aws-usw2-test-rni68-tiered-storage-snc/o-naa2l-c-vo06zqe-ursa/compaction
GCP
gs://<your-cloud-environment-id-tiered-storage>/<your-cluster-id>/compaction
AZURE
abfss://tiered-storage@<your-storage-account>.dfs.core.windows.net/<your-organization-and-cluster-id>/compaction
e.g.
gs://gcp-usw2-test-hhein-tiered-storage/o-78m1b-c-9ahma2v-ursa/compaction
Note : If you are using the StreamNative provided bucket, do not close the browser while creating the catalog. This will cause StreamNative to create a new cluster id. Once a catalog is created in Snowflake Open Catalog, the base location and additional locations cannot be changed. If the cluster id changes, you would need to create a new catalog.
To integrate with Snowflake Open Catalog, Enable Catalog Integration and select Snowflake Open Catalog.
<CLIENT ID>:<SECRET>
Clicking Cluster Size will test the connection to the storage bucket and the Snowflake Open Catalog.
Click Continue to begin sizing your cluster.
For this example, we deploy using the smallest cluster size. Click Finish to start deploying the StreamNative BYOC Ursa Cluster into your Cloud Environment.
When cluster deployment is complete, it will appear on the Organization Dashboard with a green circle.
The Lakehouse Storage configuration can be viewed by clicking on the Instance on the Organization Dashboard and selecting Configuration in the left pane.
Follow the creating and running a producer section to produce Kafka messages to a topic.
AWS
Navigate to the user provided or StreamNative provided s3 bucket. In this example the user provided bucket is s3://streamnativeopencatalog/test. A storage folder and compaction folder have been created by the cluster.
We published messages to multiple topics in the the public/default tenant/namespace. We see folders for the tenant, namespace, and each topic inside the compaction folder.
Inside each topic folder, we find partition and metadata folders.
GCP
Navigate to the user provided or StreamNative provided GCP storage bucket. In this example the StreamNative provided bucket is gs://gcp-usw2-test-hhein-tiered-storage/o-78m1b-c-37kll59-ursa. A storage folder and compaction folder have been created by the cluster.
We published messages to topic kafkatopic1 in the the public/default tenant/namespace. We see folders for the public tenant, default namespace, and kafkatopic1 topic inside the compaction folder. Inside each topic folder, we find partition and metadata folders.
AZURE
Navigate to the tiered-storage container of the Storage account. You will find a subfolder for your organization and cluster (e.g. o-mj3r8-c-u6y2cpq-ursa) with storage and compaction folders. The compaction folder will not be present until a few minutes after publishing data to your cluster.
Inside the compaction folder you will find subfolders for your tenant, namespace, and topics. Inside the topics folder and metadata and partition folders. You will find the parquet files inside the partition folders.
Once we have published messages to a topic and the compaction folder has been created in the s3 bucket, we can verify the tables and schemas are visible in Snowflake Open Catalog. We can see the resulting topics created in streamnative/public/default with a registered schema.
Querying a table in Snowflake Open Catalog using Snowflake requires completing the following from the Snowflake documentation. This video shows detailed queries for the above example (AWS Example).
Please refer to the Snowflake documentation here for the complete code samples for creating an external volume for each cloud provider.
The video includes the following details from our AWS example:
s3://<>/<>/compaction
.Please refer to the Snowflake documentation here for the complete code samples.
The video includes the following details from our AWS example:
<CLIENT ID>:<SECRET>
for Snowflake Open Catalog to allow access for Snowflake. The <CLIENT ID>
refers to OAUTH_CLIENT_ID and <SECRET>
refers to OAUTH_CLIENT_SECRET.You will need to create a new catalog integration for each tenant.namespace.
Please refer to the Snowflake documentation here for the complete code samples.
The video includes the following details from our AWS example:
You will need to create a new externally managed table for each topic.
Once completing these steps, you will be able to query the Iceberg Table registered in Snowflake Open Catalog through Snowflake AI Data Cloud.
This guide offers a detailed walkthrough for integrating StreamNative Cloud with Snowflake Open Catalog. It covers essential aspects such as configuring authentication, storage buckets, catalogs, and other key components. Snowflake Open Catalog Integration is available with StreamNative BYOC Ursa clusters, which can be deployed into your AWS, GCP, or Azure cloud account. Specific directions are included for each cloud provider where applicable. By following this guide, you will enable seamless interaction between StreamNative Cloud and Snowflake Open Catalog.
Before initiating the integration of Snowflake Open Catalog with StreamNative Cloud, please ensure the following steps are completed. You can also watch this video to learn more about Preparing Snowflake Open Catalog AWS Example
Create a Snowflake AI Data Cloud account. The homepage of a Snowflake AI Data Cloud account will look as follows.
To access the Snowflake Open Catalog console, a specialized Open Catalog account must be created. This account type is specifically designed for managing Open Catalog features and functionality.
Enter Admin → Accounts → Toggle → Create Snowflake Open Catalog Account
Configure the Snowflake Open Catalog
The Snowflake Open Catalog, storage bucket, and StreamNative BYOC Ursa cluster should be in the same cloud provider and region. Snowflake Open Catalog doesn’t support cross-region buckets. To avoid costs associated with cross-region traffic, we highly recommend your storage bucket and StreamNative BYOC Ursa cluster are in the same region.
Edition: any
Next, input a Snowflake Open Catalog Account Name, User Name,Password, and Email. This will create a new user for use specifically with the Snowflake Open Catalog Account.
Click Create Account. You will see the following if account creation is successful. We highly recommend taking a screenshot of this confirmation message. This Account Locator URL will be used in later steps.
Click the Account URL, then sign into your open catalog account. You will enter the Snowflake Open Catalog console.
If you need the Account URL of your Snowflake Open Catalog Account in the future, navigate to Admin → Accounts → … → Manage URLs of your Snowflake Account. This page is available in your Snowflake AI Data Cloud Account. The Locator column, in combination with the Region, can be used to construct the Account Locator URL. The Account Locator URL will be needed when configuring the StreamNative BYOC Ursa cluster.
URI: Account Locator URL when creating Snowflake Open Catalog. Append ‘/polaris/api/catalog’ to the URI. The StreamNative UI requires you use the Account Locator URL instead of the Account URL. The Account Locator URL will be in the following format.
Choose bucket location and grant access to StreamNative Cloud. You have two choices to setup a storage bucket.
The Snowflake Open Catalog, storage bucket, and StreamNative BYOC Ursa cluster should be in the same cloud provider and region.** Snowflake Open Catalog doesn’t support cross-region buckets. To avoid costs associated with cross-region traffic, we highly recommend your storage bucket and StreamNative BYOC Ursa cluster are in the same region.
Option 1: Use your own bucket (recommended)
You need to create your own storage bucket, with the option to create a bucket path. When using your own bucket, the resulting path you will use for creation of the Snowflake Open Catalog will be as follows. The compaction folder will be created automatically by the StreamNative cluster.
AWS
s3://<your-bucket-name>/<your-bucket-path>/compaction
GCP
gs://<your-bucket-name>/<your-bucket-path>/compaction
Azure : The option to use your own bucket is currently not available on Azure
StreamNative will require access to this storage bucket. To grant access, execute the following Terraform module based on your cloud provider.
Terraform module for AWS
Terraform module for GCP
You can find your organization name in the StreamNative console, as shown below:
Before executing the Terraform module, you must provide your console access to your cloud provider. These variables are used to grant your console access to the cloud service provider account where the storage bucket bucket is located.
AWS
Learn more about how to configure AWS CLI here.
GCP
The following commands are used to authenticate using a user account at the following link.
Run the Terraform module
Option 2: Use StreamNative provided bucket
This process involves deploying the StreamNative BYOC Cloud Connection, Cloud Environment, and beginning the process of deploying the StreamNative BYOC Ursa Cluster to obtain the cluster id. StreamNative will automatically assign the necessary permissions to this bucket. The option to use StreamNative provided bucket is available on AWS, GCP and Azure.
To proceed, you will need to first complete the steps for granting vendor access, creating a Cloud Connection, and setting up the Cloud Environment. Next, begin the process of deploying the StreamNative BYOC Ursa Cluster to obtain the cluster id. Step 1 of Create StreamNative BYOC Ursa Cluster below includes directions on obtaining the cluster id from the Lakehouse Storage Configuration page.
When using a StreamNative-provided bucket, the resulting path you will use for creation of the Snowflake Open Catalog will be as follows. The cloud environment id will be created during the deployment of the Cloud Environment. The cluster id is assigned when starting the cluster creation process in the StreamNative Console and is found on the Lakehouse Storage Configuration page.
AWS
s3://<your-cloud-environement-id>/<your-cluster-id>/compaction
GCP
gs://<your-cloud-environment-id-tiered-storage>/<your-cluster-id>/compaction
Azure
gs://<your-cloud-environment-id-tiered-storage>/<your-cluster-id>/compaction
AWS Create IAM policy and role for Snowflake Open Catalog Access.
In the AWS console, enter Access management → Policies → Create policy
Then choose the JSON format. Enter the rule as follows, replacing <your-bucket-name>
and <your-bucket-path>
Click Next
Provide a policy name and click Create policy.
Create IAM Role
In the AWS console, enter Access management → Roles → Create role
Click Next
Select the policy created in the previous step. Then click Next
Input a role name and click Create role.
View the detailed role information and record the ARN
This policy and role are used for Snowflake Open Catalog access to the s3 bucket.
GCP
Create a role for Snowflake Open Catalog bucket access.
Navigate to IAM → Roles → Create role
Provide a role title and ID (e.g. streamnative_pulsar_open_catalog).
Provide the following permissions:
Click Create. This role with be used by Snowflake Open Catalog to access the bucket.
Azure
We need to obtain the Azure Tenant ID before continuing to the next step
In Azure console, search for Tenant properties and select Tenant properties
Record the Tenant ID
Create Snowflake Open Catalog
AWS
User provided bucket:
s3://<your-bucket-name>/<your-bucket-path>/compaction
StreamNative provided bucket:
s3://<your-cloud-environement-id>/<your-cluster-id>/compaction
Then click Create, you will see the catalog streamnative created
View the catalog details and capture the value of the IAM user arn. The Snowflake Open Catalog will use this arn to access our AWS bucket.
Trust the Snowflake Open Catalog Iam user arn
In the AWS console, enter Access management → Roles, search for the role we created before.
Then click Trust relationships → edit trust policy
Change the value of Principal:AWS to the Snowflake Open Catalog IAM user arn
Then click Update policy and the Snowflake Open Catalog can access the bucket.
GCP
gs://<your-bucket-name>/<your-bucket-path>/compaction
gs://<your-cloud-environment-id>/<your-cluster-id>/compaction
Additional locations: not configured
Then click Create, you will see the catalog pulsar created.
Select the catalog and Catalog Details. Here we need to record the value of the GCP_SERVICE_ACCOUNT. The Snowflake Open Catalog will use this account to access our storage bucket.
Navigate to Cloud Storage -> Buckets and select the root of the storage bucket.
<your-bucket-name>
<your-cloud-environment-id-tiered-storage>
Select Permissions-> Grant access
New principals: paste the GCP_SERVICE_ACCOUNT from the catalog
Role: paste the name of the role created in the previous step (e.g. streamnative_pulsar_open_catalog)
Snowflake Open Catalog now has access to the GCP storage bucket.
Azure
Navigate to Catalog Details and copy the AZURE_CONSENT_URL and AZURE_MULTI_TENANT_APP_NAME.
Navigate to the AZURE_CONSENT_URL in a browser and click Aceept. You will be redirected to the Snowflake homepage. A trusted app named after the string prefix of the AZURE_MULTI_TENANT_APP_NAME was created in Azure and will be used in the next step.
Navigate to the storage account of the StreamNative BYOC Ursa Cluster. In our example, the SN Bucket Location is as follows:
abfss://tiered-storage@snpmtscbbad75227e51b50.dfs.core.windows.net/o-mj3r8-c-u6y2cpq-ursa/compaction
The Storage account is therefore: snpmtscbbad75227e51b50
Navigate to Access Control (IAM) -> + Add -> Add role assignment
Search for Storage Blob Data and select Storage Blob Data Contributor. Click Next.
Click + Select members. Paste in the string prefix of the AZURE_MULTI_TENANT_APP_NAME and select the Application. Click Select.
Click Review + assign twice.
You can verify permissions by selecting Role assignments and searching for the application.
Our engine needs a connection to access the Snowflake Open Catalog, so we need to create one. We will later reuse this connection for Snowflake to access Snowflake Open Catalog.
Then click Create, and you will see a pane. Record the Client ID and Client Secret for this connection as <CLIENT ID>:<SECRET>
. Our engine needs it to access the Snowflake Open Catalog.
We now have a Service Connection called streamnativeconnection linked to the Principal Role streamnativeprincipal.
Create a Snowflake Catalog Role
Enter catalogs → detail catalog streamnative (or pulsar) → Roles → + Catalog Role
Name: streamnativeopencatalog
Privileges:
Click Create.
Then click Grant to Principal Role
Then click Grant
The catalog role streamnative_open_catalog_role now has the 10 required permissions on catalog streamnative (or pulsar).
We will resuse the connection when connecting Snowflake to Snowflake Open Catalog.
To proceed, you will need to first complete the steps for granting vendor access, creating a Cloud Connection, and setting up the Cloud Environment.Then you can begin the process of deploying the StreamNative BYOC Ursa Cluster. You can also watch this video to learn more about deploying the StreamNative BYOC Ursa Cluster (AWS Example).
In this section we create and set up a cluster in StreamNative Cloud. Login to StreamNative Cloud and click on ‘Create an instance and deploy cluster’
Click on Deploy BYOC
Enter Instance name, select your Cloud Connection, select URSA Engine and click on Cluster Location
Enter Cluster Name, select your Cloud Environment, select Multi AZ and click on Lakehouse Storage Configuration
To configure Storage Location there are two options
Option 1: Select Use Your Own Bucket to choose your own storage bucket by entering the following details
AWS
GCP
Option 2: Select Use Existing BYOC Bucket to choose the bucket created by StreamNative (AWS, GCP or Azure)
The UI will present you with the SN Bucket Location in this format to be used when creating the Snowflake Open Catalog.
AWS
s3://<your-cloud-environement-id>/<your-cluster-id>/compaction
e.g.
s3://aws-usw2-test-rni68-tiered-storage-snc/o-naa2l-c-vo06zqe-ursa/compaction
GCP
gs://<your-cloud-environment-id-tiered-storage>/<your-cluster-id>/compaction
AZURE
abfss://tiered-storage@<your-storage-account>.dfs.core.windows.net/<your-organization-and-cluster-id>/compaction
e.g.
gs://gcp-usw2-test-hhein-tiered-storage/o-78m1b-c-9ahma2v-ursa/compaction
Note : If you are using the StreamNative provided bucket, do not close the browser while creating the catalog. This will cause StreamNative to create a new cluster id. Once a catalog is created in Snowflake Open Catalog, the base location and additional locations cannot be changed. If the cluster id changes, you would need to create a new catalog.
To integrate with Snowflake Open Catalog, Enable Catalog Integration and select Snowflake Open Catalog.
<CLIENT ID>:<SECRET>
Clicking Cluster Size will test the connection to the storage bucket and the Snowflake Open Catalog.
Click Continue to begin sizing your cluster.
For this example, we deploy using the smallest cluster size. Click Finish to start deploying the StreamNative BYOC Ursa Cluster into your Cloud Environment.
When cluster deployment is complete, it will appear on the Organization Dashboard with a green circle.
The Lakehouse Storage configuration can be viewed by clicking on the Instance on the Organization Dashboard and selecting Configuration in the left pane.
Follow the creating and running a producer section to produce Kafka messages to a topic.
AWS
Navigate to the user provided or StreamNative provided s3 bucket. In this example the user provided bucket is s3://streamnativeopencatalog/test. A storage folder and compaction folder have been created by the cluster.
We published messages to multiple topics in the the public/default tenant/namespace. We see folders for the tenant, namespace, and each topic inside the compaction folder.
Inside each topic folder, we find partition and metadata folders.
GCP
Navigate to the user provided or StreamNative provided GCP storage bucket. In this example the StreamNative provided bucket is gs://gcp-usw2-test-hhein-tiered-storage/o-78m1b-c-37kll59-ursa. A storage folder and compaction folder have been created by the cluster.
We published messages to topic kafkatopic1 in the the public/default tenant/namespace. We see folders for the public tenant, default namespace, and kafkatopic1 topic inside the compaction folder. Inside each topic folder, we find partition and metadata folders.
AZURE
Navigate to the tiered-storage container of the Storage account. You will find a subfolder for your organization and cluster (e.g. o-mj3r8-c-u6y2cpq-ursa) with storage and compaction folders. The compaction folder will not be present until a few minutes after publishing data to your cluster.
Inside the compaction folder you will find subfolders for your tenant, namespace, and topics. Inside the topics folder and metadata and partition folders. You will find the parquet files inside the partition folders.
Once we have published messages to a topic and the compaction folder has been created in the s3 bucket, we can verify the tables and schemas are visible in Snowflake Open Catalog. We can see the resulting topics created in streamnative/public/default with a registered schema.
Querying a table in Snowflake Open Catalog using Snowflake requires completing the following from the Snowflake documentation. This video shows detailed queries for the above example (AWS Example).
Please refer to the Snowflake documentation here for the complete code samples for creating an external volume for each cloud provider.
The video includes the following details from our AWS example:
s3://<>/<>/compaction
.Please refer to the Snowflake documentation here for the complete code samples.
The video includes the following details from our AWS example:
<CLIENT ID>:<SECRET>
for Snowflake Open Catalog to allow access for Snowflake. The <CLIENT ID>
refers to OAUTH_CLIENT_ID and <SECRET>
refers to OAUTH_CLIENT_SECRET.You will need to create a new catalog integration for each tenant.namespace.
Please refer to the Snowflake documentation here for the complete code samples.
The video includes the following details from our AWS example:
You will need to create a new externally managed table for each topic.
Once completing these steps, you will be able to query the Iceberg Table registered in Snowflake Open Catalog through Snowflake AI Data Cloud.