Documentation Index
Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
Use this file to discover all available pages before exploring further.
Lakehouse Table is a zero-ETL integration that automatically converts streaming data from Apache Pulsar topics into open table formats — Apache Iceberg and Delta Lake — stored directly on object storage (AWS S3, GCS, Azure Blob Storage). This enables unified streaming and analytics access to the same data without building or maintaining separate data pipelines.
Architecture
┌──────────────────┐
│ Pulsar Broker │
└────────┬─────────┘
│
┌───────────────┴────────────────┐
│ WAL Storage │
│ ┌──────────────────────────┐ │
│ │ Latency-Optimized: │ │
│ │ Apache BookKeeper │ │
│ ├──────────────────────────┤ │
│ │ Cost-Optimized: │ │
│ │ Object Storage │ │
│ │ (S3 / GCS / Azure) │ │
│ └──────────────────────────┘ │
└───────────────┬────────────────┘
│ Reads from both
▼
┌─────────────────────────────┐
│ Compaction Service │
│ (WAL → Parquet conversion) │
└────────────┬────────────────┘
│ Commit
▼
┌─────────────────────────────┐
│ Lakehouse Table │
│ (Iceberg / Delta Lake) │
└─────────────────────────────┘
│
┌────────────┴────────────┐
▼ ▼
External Catalog Query Engines
(Unity Catalog, S3Table, (Spark, Trino,
BigLake, Snowflake) DuckDB, Athena)
WAL Storage Options
Lakehouse Table supports two WAL storage tiers:
- Latency-optimized (Apache BookKeeper): Low-latency writes for performance-sensitive workloads
- Cost-optimized (Object Storage): Direct writes to AWS S3, GCS, or Azure Blob Storage for cost efficiency
The Compaction Service reads from both BookKeeper and Object Storage, converts the data to Parquet format, and commits snapshots to the lakehouse catalog.
Coordination
Oxia serves as the metadata store for coordination, leader election, schema storage, and offset index management.
The Compaction Service operates with a leader-worker architecture: the leader publishes compaction tasks and commits results to the lakehouse catalog, while workers perform the WAL-to-Parquet conversion.
Table Modes
External Table (SDT — Stream Delivered Table)
An External Table delivers data from Pulsar topics into an external lakehouse catalog (such as Databricks Unity Catalog, Snowflake, AWS S3Table, or Google BigLake). A separate copy of the topic data is written to the Lakehouse table — the Pulsar topic and the Lakehouse table hold independent copies of the same records. In this mode:
- Data is written to Iceberg or Delta Lake tables managed by the external catalog as a separate copy from the Pulsar topic
- Analytical access via standard table APIs (Spark, Trino, DuckDB, Athena, etc.)
- Supports upsert, partition key, and schema evolution
- The external catalog governs data lifecycle (retention, deletion) for the Lakehouse copy independently of the Pulsar topic
- Streaming reads with offset semantics are not supported on the delivered data
Use External Tables when: you want to deliver streaming data into curated lakehouse tables for analytics, integrate with existing data platforms, or need upsert/deduplication capabilities.
Internal Table (SBT — Stream Backed Table)
Coming Soon — Internal Table support is under active development.
An Internal Table is managed entirely by Ursa Storage. The Pulsar topic and the Lakehouse table share the same single copy of data — there is no separate write to the Lakehouse table. The same physical data supports both streaming reads (with offset tracking and replay) and analytical queries — true stream-table duality with zero data duplication.
Supported Cluster Profiles and Protocols
StreamNative Private Cloud offers two cluster types (Pulsar and Kafka), each with two performance profiles (latency-optimized and cost-optimized). Lakehouse delivery support depends on the combination of cluster type, profile, and producer protocol.
| Cluster Type | Profile | Producer Protocol | Lakehouse Delivery |
|---|
| Pulsar | Latency-optimized | Pulsar | Supported |
| Pulsar | Latency-optimized | Kafka | Coming Soon |
| Pulsar | Cost-optimized | Kafka | Supported |
| Pulsar | Cost-optimized | Pulsar | Not yet supported |
| Kafka | Cost-optimized | Kafka | Supported |
| Kafka | Latency-optimized | Kafka | Coming Soon |
Notes:
- A Pulsar latency-optimized cluster uses Apache BookKeeper as the WAL tier. Topic data produced via the Pulsar protocol can be delivered to Lakehouse today; Kafka-protocol delivery is on the roadmap.
- A Pulsar cost-optimized cluster uses object storage as the WAL tier. Topic data produced via the Kafka protocol is delivered to Lakehouse; the Pulsar protocol is not yet supported on this profile.
- A Kafka cost-optimized cluster delivers Kafka topic data to Lakehouse today.
- A Kafka latency-optimized cluster will support Lakehouse delivery in a future release.
| Format | Status |
|---|
| Apache Iceberg | Supported |
| Delta Lake | Supported |
Supported Cloud Storage
| Provider | WAL Storage | Lakehouse Table |
|---|
| AWS S3 | Supported | Supported |
| Google Cloud Storage | Supported | Supported |
| Azure Blob Storage | Supported | Supported |
Supported Catalogs
Catalog support varies by cloud provider:
| Catalog | Table Format | AWS | GCP | Azure |
|---|
| Databricks Unity Catalog (Managed Iceberg) | Iceberg | Supported | Supported | Supported |
| Databricks Unity Catalog (Delta Lake) | Delta Lake | Supported | Supported | Supported |
| Snowflake Horizon Catalog | Iceberg | Supported | Supported | Supported |
| Snowflake Open Catalog (Polaris) | Iceberg | Supported | Supported | Supported |
| AWS S3Table | Iceberg | Supported | — | — |
| Google BigLake | Iceberg | — | Supported | — |
Topic to lakehouse identifier mapping
When data is delivered from a Pulsar topic to a lakehouse table, the topic’s tenant, namespace, and topic name are mapped to a catalog namespace and a table name. The mapping rules differ by catalog type because Pulsar allows characters (/, ., -, :) that are not valid in many catalog identifiers.
The compaction service applies the following rules.
Iceberg with hierarchical catalogs (Snowflake Open Catalog, Snowflake Horizon, Iceberg REST, Iceberg Hadoop)
The original Pulsar identifiers are used unchanged:
- Catalog namespace:
<tenant>.<namespace> (two-level)
- Table name: the topic local name (the part after the namespace)
For example, the topic persistent://my-tenant/my-namespace/orders is mapped to namespace my-tenant.my-namespace and table orders.
Iceberg with flat-namespace catalogs (AWS S3Tables, Google BigLake, Hive)
These catalogs only accept a single-level namespace, so the tenant and namespace are flattened into one identifier with a cluster-name prefix. Each component is escaped to remove invalid characters:
| Source character | Replacement |
|---|
/ | ___ (three underscores) |
. | _ (one underscore) |
- | __ (two underscores) |
: | ____ (four underscores) |
- Catalog namespace:
<cluster>_<formatted-tenant>_<formatted-namespace> (default cluster prefix is pulsar; configurable via the cluster property in the compaction service)
- Table name (S3Tables): the topic local name with the same character escapes applied
- Table name (BigLake, Hive): the topic local name as-is
For example, with the default cluster prefix pulsar, topic persistent://public-v1/default.v2/test-table-v1:
- On AWS S3Tables: namespace
pulsar_public__v1_default_v2, table test__table__v1
- On Google BigLake: namespace
pulsar_public__v1_default_v2, table test-table-v1
Databricks Unity Catalog (Iceberg or Delta Lake)
Unity Catalog uses a three-level identifier (catalog.schema.table). The compaction service writes all topics into a single schema and encodes the full Pulsar topic path into the table name, so each catalog table maps 1:1 to a Pulsar topic. The full topic path <tenant>/<namespace>/<topic> is flattened with these escapes:
| Source character | Replacement |
|---|
/ | __ (two underscores) |
. | ____ (four underscores) |
- | ___ (three underscores) |
For example, topic persistent://public/default/test-topic is mapped to table name public__default__test___topic. Topic persistent://public/default/v1.events is mapped to public__default__v1____events.
Next Steps