Skip to main content
Lakestream is the architectural paradigm that unifies real-time data streaming with lakehouse storage. It separates data, metadata, and protocol into three independent layers, enabling both Kafka and Pulsar to run on the same lakehouse-native foundation. StreamNative’s Ursa Engine is the implementation of the storage layer of Lakestream. It was recognized with the VLDB 2025 Best Industry Paper award for its novel approach to lakehouse-native stream storage. Read more in the blog post: Ursa Wins VLDB 2025 Best Industry Paper.

Architecture

Lakestream separates the streaming stack into three independent layers: Lakestream Architecture

Protocol Layer (stateless serving)

Brokers are stateless and leaderless. Any broker can handle produce or fetch requests for any partition. There is no leader election, no partition rebalancing, and no broker-to-broker replication. This means:
  • Compute scales independently from storage
  • Brokers can be added or removed without data migration
  • No cross-AZ replication traffic between brokers

Metadata Layer (catalog)

Oxia replaces ZooKeeper as the metadata store. It provides scalable, strongly consistent metadata management without the operational complexity of ZooKeeper clusters. The Iceberg Catalog tracks table metadata for lakehouse integration.

Data Layer (Ursa Stream Storage)

Data writes directly to object storage (S3, GCS, or Azure Blob Storage) using a Write-Ahead Log (WAL) implementation. The storage layer provides the flexibility to choose between local disks for low latency and shared storage (lakehouse storage) for cost-efficiency. This design:
  • Supports both disk-based and diskless storage modes
  • Stores data in open table formats (Iceberg, Delta Lake)
  • Makes every stream simultaneously queryable as a lakehouse table

Lakestream vs Traditional Architectures

The following diagram shows how streaming architecture has evolved from monolithic designs to the Lakestream paradigm: Streaming Architecture Evolution: From Monolith to Lakestream

Key benefits

Leaderless and Stateless

No leader elections, no partition rebalancing, no broker disks. Brokers are stateless and interchangeable.

Up to 95% Cost Reduction

Data writes directly to object storage, eliminating expensive cross-AZ replication between brokers.

Stream-Table Duality

Every event written to a topic simultaneously exists as a row in an Iceberg or Delta Lake table. Zero-copy, no ETL.

Open Formats

Data stored in Iceberg and Delta Lake on your object storage. Query with any engine. No vendor lock-in.

Services powered by Lakestream

Lakestream powers both streaming services on StreamNative Cloud:

Kafka Service

Native Apache Kafka API. Ideal for event streaming, log aggregation, CDC, and IoT telemetry.

Pulsar Service

Apache Pulsar with multi-protocol support. Ideal for messaging, queuing, and multi-tenant workloads.

Learn more