1. Integrate with Data Lakehouse
  2. Lakehouse Storage

Integrate With S3 Tables

Introduction

This guide offers a detailed walkthrough for integrating StreamNative Cloud with AWS S3 Tables. By following this guide, you will enable seamless interaction between StreamNative and AWS S3 Table buckets, while also making data queryable through AWS Lake Formation integrations.

Setup AWS S3 Table bucket and S3 general purpose bucket

AWS S3 Tables is available on multiple regions. All these regions are supported by StreamNative's S3 Table Integration. Users must create an S3 Table bucket in the region of their choice. In addition to creating the S3 Table bucket, we must choose wether to use the StreamNative provided storage bucket for the cluster (this will be in the region of the BYOC Cloud Environment) or supply a user provided s3 general purpose bucket. The S3 Table bucket and general purpose bucket for cluster storage must be in the same region. Watch this video to learn more about the steps to setup S3 Table Bucket.

Step 1: Create AWS S3 Table bucket

Create an S3 Table bucket by navigating to Amazon S3 → Table buckets → Create table bucket.

Create S3 Table Bucket

Supply a bucket name and enable Integration with AWS analytics services in the region. In the image below, AWS analytics services is already enabled. Click Create table bucket.

Enter table bucket name

Copy the S3 Table bucket arn. We will use this arn when creating the StreamNative BYOC Ursa Cluster.

Step 2: Setup Cluster Storage Bucket

Choose location for cluster storage. You have two choices to setup a storage bucket.

Use StreamNative provided bucket

This process involves deploying the StreamNative BYOC Cloud Environment. StreamNative will automatically assign the necessary permissions to this bucket. To proceed, you will need to complete the steps for granting vendor access, creating a Cloud Connection, and setting up the Cloud Environment.

Note: For this option, the Cloud Environment must be in the same region as the S3 Table bucket.

Use your own bucket

You can use your own storage bucket, with the option to create a subfolder. Note this is an S3 general purpose bucket that must be in the same region as your S3 Table bucket. It is also recommended to deploy your StreamNative BYOC Ursa Cluster in the same region as the user provided S3 general purpose bucket and S3 Table bucket to avoid cross region traffic. In the below example we created an Amazon S3 general purpose bucket called sncustombucket with a subfolder called test.

Custom bucket for storage

Step 3. Setup permissions for StreamNative

You can skip this step if you are using the StreamNative provided bucket. StreamNative will have permissions to the StreamNative provided bucket and the S3 Table bucket. If using your own storage bucket, complete this step to provide StreamNative permissions to the user provided storage bucket and the S3 Table bucket.

StreamNative will require access to the user provided storage bucket and the S3 Table bucket. To grant access, execute the following Terraform module.

module "sn_managed_cloud_access_bucket" {
  source = "github.com/streamnative/terraform-managed-cloud//modules/aws/volume-access?ref=v3.19.0"

  external_id = "<your-organization>"
  role        = "<your-role-name>"
  buckets = [
    "<storage bucket>/<storage bucket path>"
  ]

  account_ids = [
    "<aws account id>"
  ]
}

module "sn_managed_cloud_access_s3_table" {
  source     = "github.com/streamnative/terraform-managed-cloud//modules/aws/s3-table-access?ref=v3.19.0"
  role       = module.sn_managed_cloud_access_bucket.role
  s3_tables  = [
    "<s3 table bucket arn>"
  ]
  depends_on = [module.sn_managed_cloud_access_bucket]
}

You can find your organization name in the StreamNative console, as shown below:

Organization name

Before executing the Terraform module, you must define the following environment variables. These variables are used to grant you access to the AWS account where the S3 bucket is located.

export AWS_ACCESS_KEY_ID="<YOUR_AWS_ACCESS_KEY_ID>"
export AWS_SECRET_ACCESS_KEY="<YOUR_AWS_SECRET_ACCESS_KEY>"
export AWS_SESSION_TOKEN="<YOUR_AWS_SESSION_TOKEN>"

Run the Terraform module

terraform init
terraform plan
terraform apply

Create StreamNative BYOC Ursa Cluster

To proceed, you will need to first complete the steps for granting vendor access, creating a Cloud Connection, and setting up the Cloud Environment. Then you can begin the process of deploying the StreamNative BYOC Ursa Cluster.

Step 1: Create a StreamNative BYOC Ursa Cluster in StreamNative Cloud Console

In this section we create and set up a cluster in StreamNative Cloud. Login to StreamNative Cloud and click on ‘+ New’ or ‘Create an instance and deploy a cluster’

Create StreamNative Ursa Cluster

Click on Deploy BYOC

Deploy BYOC

Enter Instance name, select your Cloud Connection, select URSA Engine and click on Cluster Location

Enter Instance name

Enter Cluster Name, select your Cloud Environment, select Multi AZ and click on Lakehouse Storage Configuration

Enter cluster name

To configure Storage Location there are two options

Option 1: Select Use Existing BYOC Bucket to use the bucket created by StreamNative

Use existing storage bucket

Option 2: Select Use Your Own Bucket to choose your own storage bucket by entering the following details

  • AWS role arn (role created with terraform module)
  • Region
  • Bucket name
  • Bucket path
  • Confirm that StreamNative has been granted the necessary permissions to access your S3 bucket. The required permissions were granted by running the sn_managed_cloud_access_bucket and sn_managed_cloud_access_s3_table Terraform modules in the previous step.

Use custom storage bucket

To integrate with AWS Table bucket, Enable Catalog Integration and select External Table. Select AWS S3 Table.

Enter S3 Table Bucket ARN and click on Cluster Size

Enable catalog integration

Clicking Cluster Size will test the connection to the S3 general purpose bucket if user provided and the S3 Table bucket.

Test the connection

Click Continue to begin sizing your cluster.

For this example, we deploy using the smallest cluster size. Click Finish to start deploying the StreamNative BYOC Ursa Cluster into your Cloud Environment.

Size your cluster

When cluster deployment is complete, it will appear on the Organization Dashboard with a green circle.

Org dashboard with cluster created

The Lakehouse Storage configuration can be viewed by clicking on the Instance on the Organization Dashboard and selecting Configuration in the left pane.

View Lakehouse Storage config

Step 2: Produce Kafka messages to topic

Follow the creating and running a producer section to produce Kafka messages to a topic. StreamNative Cluster currently supports AVRO, Protobuf, JSON Schema and Primitive type schemas.

Step 3: Review s3 bucket

Navigate to the user provided or StreamNative provided s3 storage bucket. In this example the user provided bucket is s3://sncustombucket/test. A storage folder has been created by the cluster.

Navigate to storage bucket

Next we check the S3 Table bucket. You should see a table inside your S3 Table bucket for each topic. Notice the namespace of the table is pulsar*<tenant>*<namespace> of your topic. This will be important when creating a Resource Link in the following step.

Check S3 Table bucket

Configure Lake Formation to access S3 Table bucket

We will query the S3 Table bucket with AWS analytics services through Lake Formation.

Step 1: Log into Lake Formation and Enable Federated Catalog for your S3 Tables bucket

  1. Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/ and sign in as a data lake administrator. For more information on how to create a data lake administrator, see Create a data lake administrator.
  2. You should be prompted on the Lake Formation Catalogs page to create a federated catalog for your S3 Table buckets. Click Enable S3 Table Integration to enable the Federated catalog.

Enable Federated Catalog

  1. Once you complete this step, you should see a Federated s3tablescatalog.

Enable Federated Catalog

  1. Clicking into the s3tablescatalog, you should automatically see all S3 Table buckets in your region. As you create more S3 Table buckets, they should automatically show up in the Federated s3tablescatalog.

View all S3 Table Buckets

  1. Create a Resource Link for each namespace (e.g. pulsar_public_tenant) listed in the Tables of your S3 Table bucket.
aws glue create-database --region <region> --catalog-id "<aws account id>" --database-input \

'{
  "Name": "resource-link-name",
  "TargetDatabase": {
    "CatalogId": "<aws account id>:s3tablescatalog/<s3 table bucket>",
    "DatabaseName": "pulsar_<tenant>_<namespace>"
  },
  "CreateTableDefaultPermissions": []
}'
  1. To view the Resource Link, click into the Default Catalog. You should see an entry for the Resource Link we created.

View resource link in default catalog

Step 3: Grant Lake Formation Permissions

We grant permissions on the S3 Table bucket and the Resource Link.

To grant permissions on the S3 Table bucket using Lake Formation:

Step 3: Grant Lake Formation Permissions

We grant permissions on the S3 Table bucket and the Resource Link.To grant permissions on the S3 Table bucket using Lake Formation:

  1. On the Data permission page in Lake Formation, Select Grant.

Select grant on page permissions

  1. Permissions page, under Principals, do one of the following:
    • For Athena or Amazon Redshift, choose IAM users and roles and select the IAM user or role that will run queries.
    • For Firehose, choose IAM users and roles and select the service role your created to stream to tables.
    • For Amazon QuickSight, choose SAML user and groups and enter the ARN of your Amazon Quicksight admin user.
  2. Under LF-Tags or catalog resources, choose Name Data Catalog resources.
  3. For Catalogs, choose the catalog created when you created your S3 Table bucket (e.g. <aws account id>/s3tablescatalog/<s3 table bucket>).
  4. For Databases, choose the S3 Table bucket namespace with corresponsing resource link (e.g. pulsar*<tenant>*<namespace>).
  5. For Tables, choose the S3 Tables in your S3 Table bucket corresponding to specific topics or select All tables.
  6. For Table permissions, choose Super.
  7. Choose Grant.

Choose Grant

Choose Grant

To grant permissions on the Resource Link using Lake Formation.

  1. On the Data permission page in Lake Formation, Select Grant.
  2. Permissions page, under Principals, do one of the following:
    • For Athena or Amazon Redshift, choose IAM users and roles and select the IAM user or role that will run queries.
    • For Firehose, choose IAM users and roles and select the service role your created to stream to tables.
    • For Amazon QuickSight, choose SAML user and groups and enter the ARN of your Amazon Quicksight admin user.
  3. Under LF-Tags or catalog resources, choose Name Data Catalog resources.
  4. For Catalogs, choose your account ID, which is the Default Catalog.
  5. For Databases, choose the resource link you created for your S3 Table bucket namespace.
  6. For Tables, choose the S3 Tables in your S3 Table bucket corresponding to specific topics or select All tables.
  7. For Table permissions, choose Describe.
  8. Choose Grant.

Choose Grant

Choose Grant

Step 4: Query S3 Table bucket with AWS Athena

  1. Log into AWS Athena.
  2. To query data using AWS Athena, select the following in the left pane:

Data source: choose AwsDataCatalog

Catalog: s3tablescatalog/<s3_table_bucket>

Database: pulsar*<tenant>*<namespace>

Tables and views: choose table (topic)

  1. Execute desired query

Query iceberg tables in Athena

View this video to learn more about how to query S3 Table Bucket with Amazon Athena.

Previous
Integrate Snowflake Open Catalog