Introduction
This guide offers a detailed walkthrough for integrating StreamNative Cloud with Databricks Unity Catalog. It covers essential aspects such as configuring authentication, networking, storage buckets, catalogs, and other key components. Databricks Unity Catalog Integration is available with StreamNative BYOC Ursa clusters, which can be deployed into your AWS or GCP cloud account. Specific directions are included for each cloud provider where applicable.By following this guide, you will enable seamless interaction between StreamNative Cloud and Databricks Unity Catalog.Setup Databricks
Before initiating the integration of Databricks with StreamNative Cloud, please ensure the following prerequisites are fulfilled. You can also watch this video to learn more about Preparing Databricks Account (AWS Example)Cloud Service Provider Permissions:
Setting up a Databricks workspace requires appropriate AWS or GCP permissions. Ensure you are logged into your cloud provider account with an active session and administrative privileges to enable seamless authorization. To simplify the required permissions, we recommend you to use the same AWS or GCP account you used to create a StreamNative BYOC Cloud Environment.Step 1: Create Databricks Workspace
Click Create workspace to proceed









Step 2: Configure external data access
Click Catalog → Settings → Metastore to proceed.

Step 3: Configure unity catalog access setting
Part A : Choose authentication method
There are two ways to authenticate and authorize StreamNative Cluster to access Databricks Unity Catalog.Personal Access Token (PAT)
Databricks recommends using the PAT token for development and testing purposes only. In your Databricks workspace, click your Databricks username in the top bar, and then select Settings from the drop down. Then navigate to Developer → Access Tokens Manage.


OAuth2
Databricks recommends using OAuth2 for configuring authentication between StreamNative Cloud and Databricks Unity Catalog for production deployment.To configure OAuth2 Machine to Machine based authentication, follow the steps below to generate Client ID and Secret for a Service Principal. Within user settings of your database, navigate to Identity & Access, and click on ‘Manage’ button next to Service principals


Part B : Configure access permissions
Grant the necessary privileges for the catalog to ensure appropriate access permissions.
- For PAT token, Select All Accounts.
- For OAuth2, Select Service Principal. Service Principal might not show up in the drop down and you may need to search for it.

Step 4: Setup storage bucket
Choose bucket location and grant access to StreamNative Cloud. You have two choices to setup a storage bucket. Option 1: Use your own bucket You need to create your own storage bucket, with the option to create a bucket path. StreamNative will require access to this storage bucket. To grant access, execute the following Terraform module based on your cloud provider. AWS- external_id: StreamNative organization, directions after terraform modules for finding your StreamNative organization
- role: the name of the role that will be created in AWS IAM, arn needed when creating cluster
- buckets: bucket name and path
- account_id: AWS account id
- streamnative_org_id: StreamNative organization, directions after terraform modules for finding your StreamNative organization
- project: project name in GCP where bucket is located
- cluster_projects: project name in GCP where StreamNative BYOC Ursa cluster is located
- google_service_account_name: the name of the service account the will be created in GCP, service account email needed when creating cluster
- buckets: the bucket name and path


Step 5: Grant bucket permissions to the Databricks Unity Catalog role
You can choose one of the following to create credentials according to your requirements.Databricks workspace is not under the same AWS account as your S3 bucket
- Login to the aws console where the s3 bucket is located.
- Refer to this document to create a storage credential.
Databricks workspace is under the same AWS account as your S3 bucket
During the Databricks workspace initialization, an AWS role is automatically created for the Unity Catalog. You can view the Unity Catalog role’s ARN (Amazon Resource Name) in the AWS console. AWS Click Catalog → Settings → Credentials to proceed.

- Access AWS IAM Console : Log in to the AWS Management Console and navigate to the IAM service.
- Search for the Role : In the IAM dashboard, search for the IAM role.
- View Role Details: : Click on the role to open its detail page.

<YOUR_CLOUD_ENVIRONMENT_ID>-tiered-storage-snc
.





- storage.buckets.get
- storage.objects.create
- storage.objects.delete
- storage.objects.get
- storage.objects.list



Setup StreamNative Cluster
Before creating a cluster, make sure you complete the steps for granting vendor access, creating a Cloud Connection, and setting up the Cloud Environment. You can also watch this video to learn more about deploying a StreamNative Cluster.Step 1 : Create an Ursa cluster in StreamNative Cloud Console
In this section we will create and set up a cluster in StreamNative Cloud. Login to StreamNative Cloud and click on ‘Create an instance and deploy cluster’




- AWS role
- Region
- Bucket name
- Bucket path,
- Confirm that StreamNative has been granted the necessary permissions to access your S3 bucket . The required permissions were granted by running a Terraform module in Step4.

- GCP service account: Use the complete email address which can be found in GCP IAM (created with terraform module, this is not the GCP Service Account Email for the Databricks Workspace)
- Region
- Bucket name
- Bucket path
- Confirm that StreamNative has been granted the necessary permissions to access your GCP storage bucket. The required permissions were granted by running a Terraform module.

- Enable Catalog Integration
- Within Lakehouse tables, select Managed Table
- Select Databricks Unity Catalog, for Catalog Provider
- Enter Unity Catalog Details
- Enter catalog name
- Enter Schema name
- Enter URI
- Select Authentication Type : Personal Access Token (PAT) or OAuth2



Step 2: Create an external location for the databricks unity catalog
AWS The Unity Catalog requires an external location to access the S3 bucket. To create an external location, follow these steps:


- External Location Name: Enter any name of your choice.
- URL: Specify the URL of the storage bucket.
- For a StreamNative Provided Bucket, the path has the following format.
s3://<CLOUD_ENVIRONMENT_ID>/<CLUSTER_ID>/compaction
- For a User Provided Bucket, the path has the following format.
s3://<CUSTOM_BUCKET>/<PATH>/compaction
- Storage Credential: Select the IAM role from the drop down. You can fetch this role from the Databricks workspace you created in Step1.

External Location
-> Permissions
-> Grant



- External location name
- URL: GCP storage bucket root
- Storage credential: choose the Databricks Service Account



Step 3: Produce Kafka messages to topic
Follow the creating and running a producer section to produce Kafka messages to a topic. After the external location is successfully created and produce message done, navigate to the next step.Review Ingested Data In Databricks
Step 1: Check the Databricks Unity catalog console
In the Databricks Unity Catalog console, you will see that a table has already been created and is available for use.
[NOTE]: StreamNative Cloud adheres to the following conventions for converting special characters:
- / is replaced with __
-
- is replaced with ___
- . is replaced with ____
Step 2: Check the storage bucket
The messages from the topic will be automatically offloaded to the configured storage bucket as shown in the figure below.
Step 3: View ingested data in Databricks unity catalog
At this point users can view the ingested data in the Unity Catalog as shown in the figure below. Login to Databricks workspace and navigate to the catalog to view the ingested data in the tables.