Integrate Iceberg With Google BigLake

This page provides instructions for setting up catalog integration with Google BigLake metastore for Apache Iceberg on Google Cloud Platform.

Introduction

StreamNative’s integration with Google BigLake metastore enables organizations to seamlessly deliver real-time data into governed, open lakehouse environments on Google Cloud Platform. By leveraging the Iceberg REST catalog protocol, this integration ensures strong schema enforcement, lineage tracking, and security—making streaming data AI- and analytics-ready while simplifying operations for data teams. StreamNative Cloud integrates with Google BigLake metastore to enable seamless streaming of Kafka topic data directly into Apache Iceberg tables using BigLake’s Iceberg REST Catalog support. This integration allows organizations to continuously land real-time streaming data into governed lakehouse tables without building complex ingestion pipelines. By combining StreamNative’s real-time data streaming capabilities with BigLake’s centralized metadata management and governance, customers can unify operational and analytical data architectures while leveraging open table formats for AI and analytics workloads.

Prerequisites

Before initiating the integration of Google BigLake with StreamNative Cloud, please ensure the following prerequisites are fulfilled.

A Google Cloud Platform account with BigLake API enabled.
A StreamNative Cloud account with an active cluster.
A Google Cloud Storage bucket to be used as the warehouse location.
Appropriate IAM permissions to create and manage BigLake resources.

Setup Google BigLake

Enable the BigLake API

Before creating managed Iceberg tables in BigLake metastore, you need to enable the BigLake API in your Google Cloud project.

Navigate to the Google Cloud Console.
Select your project.
Navigate to APIs & Services > Enable APIs and Services.
Search for BigLake API and enable it.

Create a BigLake Catalog

To enable streaming data from StreamNative Cloud into Apache Iceberg tables, you first create a catalog in Google BigLake metastore. When configuring the catalog, select a Cloud Storage bucket that resides in the same region as your StreamNative Pulsar or Kafka (Ursa-powered) cluster to avoid cross-region network traffic and associated latency or cost implications. It is important to note that BigLake currently maintains a 1:1 mapping between a catalog and a Cloud Storage bucket, meaning sub-directories within a bucket cannot be used to create multiple catalogs. During setup, choose Credential vending mode under authentication to allow BigLake to securely manage access to the storage location used for Iceberg tables.

View catalog details in the Google Cloud Console.

Create a StreamNative Cluster

Create a StreamNative Cloud Kafka or Pulsar cluster using the Cost-Optimized profile without enabling Lakehouse storage during the initial setup.

Lakehouse integration is intentionally skipped at this stage to allow the required permissions to be configured first.

Click Cluster Size to configure the cluster and provision it.

Register a BigLake catalog in StreamNative Cloud

Follow these steps to register a Google BigLake catalog in StreamNative Cloud:

Navigate to Organization Settings in the StreamNative Cloud console.
Click Register Catalog to register a new catalog.
Enter a Catalog Name.
Select Google BigLake as the Catalog Provider.
In the Google BigLake Details section:
1. Enter your Google Cloud Project ID.
2. Enter the Warehouse (GCS bucket) location.
Click Register to complete the catalog registration.

Once registered, the catalog becomes available for lakehouse integrations and can be used to stream topic data into Iceberg tables managed by BigLake.

Enable Lakehouse In Cluster

Follow these steps to enable Lakehouse Tables for an existing StreamNative Kafka cluster using a registered Google BigLake catalog:

Navigate to your existing Kafka cluster:
1. Go to Instance.
2. Select your Kafka Cluster.
3. Click Configuration.
4. Click Edit Cluster.
Enable lakehouse integration:
1. Turn on Enable Lakehouse Table.
2. From the catalog dropdown, select the pre-registered BigLake catalog.
Configure required Google Cloud permissions:
1. Follow the instructions shown in the StreamNative Cloud UI to identify the Google IAM service account that requires access.
2. Go to the Google Cloud Console.
3. Grant the following roles to the IAM account:
  - BigLake Editor
  - Storage Object User
  - Service Usage Consumer
4. Save the IAM permissions in the Google Cloud Console.
Return to StreamNative Cloud and continue the workflow to complete the lakehouse table enablement.

Once Lakehouse Tables are enabled, data from your Kafka topics will automatically start appearing as Apache Iceberg tables in your selected BigLake catalog, making the data immediately available for analytics and querying through the Google BigLake ecosystem.

Verify the Integration

After enabling Lakehouse Tables, you can verify the integration by querying the Iceberg tables created by your StreamNative Kafka cluster using Apache Spark with the BigLake Iceberg catalog. Follow these high-level steps to validate the setup:

Configure Apache Spark to use the BigLake Iceberg catalog Configure your Spark session to connect to the BigLake metastore using the Iceberg REST catalog configuration provided by Google BigLake. This allows Spark to discover the Iceberg tables automatically created from your Kafka topics.
Set up authentication Configure your Google Cloud authentication using a service account key or Application Default Credentials (ADC) with permissions to access BigLake and the associated GCS warehouse.
Connect Spark to the BigLake catalog Add the required Iceberg and BigLake catalog configurations in your Spark session, including:
- Catalog type (Iceberg REST or BigLake Iceberg catalog)
- GCP project
- Warehouse bucket
- Authentication configuration
List the Iceberg tables Once connected, list the namespaces and tables to confirm that the Kafka topics are visible as Iceberg tables:
- Verify that the namespace created by StreamNative appears.
- Confirm that topics are represented as Iceberg tables.
Run a validation query Execute a simple Spark SQL query (for example, a SELECT * LIMIT 10) against one of the Iceberg tables to confirm that streaming data from StreamNative is being written correctly into BigLake.
Confirm data freshness Produce some new messages into a Kafka topic and re-run your Spark query to verify that new records appear in the Iceberg table.

This validation confirms that StreamNative is successfully streaming Kafka topic data into Google BigLake as Iceberg tables and that the data is ready for downstream analytics, AI, and batch processing workloads.

Introduction

Get Started

Clusters

Data Streams

Security

Governance

Connect

Lakehouse

Process

Networking

Log And Monitor

Universal Linking

Billing

References

Introduction

Prerequisites

Setup Google BigLake

Enable the BigLake API

Create a BigLake Catalog

Create a StreamNative Cluster

Register a BigLake catalog in StreamNative Cloud

Enable Lakehouse In Cluster

Verify the Integration

Introduction

Get Started

Clusters

Data Streams

Security

Governance

Connect

Lakehouse

Process

Networking

Log And Monitor

Universal Linking

Billing

References

​Introduction

​Prerequisites

​Setup Google BigLake

​Enable the BigLake API

​Create a BigLake Catalog

​Create a StreamNative Cluster

​Register a BigLake catalog in StreamNative Cloud

​Enable Lakehouse In Cluster

​Verify the Integration

Introduction

Prerequisites

Setup Google BigLake

Enable the BigLake API

Create a BigLake Catalog

Create a StreamNative Cluster

Register a BigLake catalog in StreamNative Cloud

Enable Lakehouse In Cluster

Verify the Integration