Lakestream is the architectural paradigm that unifies real-time data streaming with lakehouse storage. It separates data, metadata, and protocol into three independent layers, enabling both Kafka and Pulsar to run on the same lakehouse-native foundation. StreamNative’s URSA engine is the implementation of the storage layer of Lakestream. URSA supports multiple WAL options — Apache BookKeeper and local disk (KRaft + ISR) for latency-optimized profiles, object storage for cost-optimized profiles. It was recognized with the VLDB 2025 Best Industry Paper award for its novel approach to lakehouse-native stream storage. Read more in the blog post: Ursa Wins VLDB 2025 Best Industry Paper.Documentation Index
Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
Lakestream separates the streaming stack into three independent layers:
Protocol Layer (stateless serving)
Brokers are stateless and leaderless. Any broker can handle produce or fetch requests for any partition. There is no leader election, no partition rebalancing, and no broker-to-broker replication. This means:- Compute scales independently from storage
- Brokers can be added or removed without data migration
- No cross-AZ replication traffic between brokers
Metadata Layer (catalog)
Oxia replaces ZooKeeper as the metadata store. It provides scalable, strongly consistent metadata management without the operational complexity of ZooKeeper clusters. The Iceberg Catalog tracks table metadata for lakehouse integration.Data Layer (Ursa Stream Storage)
Data writes directly to object storage (S3, GCS, or Azure Blob Storage) using a Write-Ahead Log (WAL) implementation. The storage layer provides the flexibility to choose between local disks for low latency and shared storage (lakehouse storage) for cost-efficiency. This design:- Supports both disk-based and diskless storage modes
- Stores data in open table formats (Iceberg, Delta Lake)
- Makes every stream simultaneously queryable as a lakehouse table
Lakestream vs Traditional Architectures
The following diagram shows how streaming architecture has evolved from monolithic designs to the Lakestream paradigm:
Key benefits
Leaderless and Stateless
No leader elections, no partition rebalancing, no broker disks. Brokers are stateless and interchangeable.
Up to 95% Cost Reduction
Data writes directly to object storage, eliminating expensive cross-AZ replication between brokers.
Stream-Table Duality
Every event written to a topic simultaneously exists as a row in an Iceberg or Delta Lake table. Zero-copy, no ETL.
Open Formats
Data stored in Iceberg and Delta Lake on your object storage. Query with any engine. No vendor lock-in.
Services powered by Lakestream
Native Pulsar protocol on the Cost-Optimized profile is coming after the Apache Pulsar 5.0 release. Today, Pulsar Clusters on the Cost-Optimized profile expose the Kafka-compatible protocol. See Cluster Profiles Overview for the full capability matrix.
Kafka Service
Native Apache Kafka API. Ideal for event streaming, log aggregation, CDC, and IoT telemetry.
Pulsar Service
Apache Pulsar with multi-protocol support. Ideal for messaging, queuing, and multi-tenant workloads.
Learn more
- Ursa Engine technical details for storage engine architecture, cluster profiles, and feature comparison
- Choose Kafka or Pulsar to decide which protocol fits your workload
- Ursa: A Lakehouse-Native Data Streaming Engine for Kafka — VLDB 2025 Best Industry Paper