Architecture
Lakestream separates the streaming stack into three independent layers:
Protocol Layer (stateless serving)
Brokers are stateless and leaderless. Any broker can handle produce or fetch requests for any partition. There is no leader election, no partition rebalancing, and no broker-to-broker replication. This means:- Compute scales independently from storage
- Brokers can be added or removed without data migration
- No cross-AZ replication traffic between brokers
Metadata Layer (catalog)
Oxia replaces ZooKeeper as the metadata store. It provides scalable, strongly consistent metadata management without the operational complexity of ZooKeeper clusters. The Iceberg Catalog tracks table metadata for lakehouse integration.Data Layer (Ursa Stream Storage)
Data writes directly to object storage (S3, GCS, or Azure Blob Storage) using a Write-Ahead Log (WAL) implementation. The storage layer provides the flexibility to choose between local disks for low latency and shared storage (lakehouse storage) for cost-efficiency. This design:- Supports both disk-based and diskless storage modes
- Stores data in open table formats (Iceberg, Delta Lake)
- Makes every stream simultaneously queryable as a lakehouse table
Lakestream vs Traditional Architectures
The following diagram shows how streaming architecture has evolved from monolithic designs to the Lakestream paradigm:
Key benefits
Leaderless and Stateless
No leader elections, no partition rebalancing, no broker disks. Brokers are stateless and interchangeable.
Up to 95% Cost Reduction
Data writes directly to object storage, eliminating expensive cross-AZ replication between brokers.
Stream-Table Duality
Every event written to a topic simultaneously exists as a row in an Iceberg or Delta Lake table. Zero-copy, no ETL.
Open Formats
Data stored in Iceberg and Delta Lake on your object storage. Query with any engine. No vendor lock-in.
Services powered by Lakestream
Lakestream powers both streaming services on StreamNative Cloud:Kafka Service
Native Apache Kafka API. Ideal for event streaming, log aggregation, CDC, and IoT telemetry.
Pulsar Service
Apache Pulsar with multi-protocol support. Ideal for messaging, queuing, and multi-tenant workloads.
Learn more
- Ursa Engine technical details for storage engine architecture, cluster profiles, and feature comparison
- Choose Kafka or Pulsar to decide which protocol fits your workload
- Ursa: A Lakehouse-Native Data Streaming Engine for Kafka — VLDB 2025 Best Industry Paper