> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Lakestream Architecture

> Lakestream is the cloud-native architecture that unifies streaming and lakehouse storage. It powers both StreamNative Kafka Service and Pulsar Service.

Lakestream is the architectural paradigm that unifies real-time data streaming with lakehouse storage. It separates data, metadata, and protocol into three independent layers, enabling both Kafka and Pulsar to run on the same lakehouse-native foundation.

StreamNative's [URSA engine](/cloud/overview/data-streaming-engine) is the implementation of the storage layer of Lakestream. URSA supports multiple WAL options — Apache BookKeeper and local disk (KRaft + ISR) for latency-optimized profiles, object storage for cost-optimized profiles. It was recognized with the **VLDB 2025 Best Industry Paper** award for its novel approach to lakehouse-native stream storage. Read more in the blog post: [Ursa Wins VLDB 2025 Best Industry Paper](https://streamnative.io/blog/ursa-wins-vldb-2025-best-industry-paper-the-first-lakehouse-native-streaming-engine-for-kafka).

## Architecture

Lakestream separates the streaming stack into three independent layers:

<img src="https://mintcdn.com/streamnative/CDw7JxphxyMzStpZ/media/lakestream-architecture.png?fit=max&auto=format&n=CDw7JxphxyMzStpZ&q=85&s=196227f3e0af2eee80bf12b03dbcad54" alt="Lakestream Architecture" width="2186" height="1228" data-path="media/lakestream-architecture.png" />

### Protocol Layer (stateless serving)

Brokers are **stateless and leaderless**. Any broker can handle produce or fetch requests for any partition. There is no leader election, no partition rebalancing, and no broker-to-broker replication. This means:

* Compute scales independently from storage
* Brokers can be added or removed without data migration
* No cross-AZ replication traffic between brokers

### Metadata Layer (catalog)

**Oxia** replaces ZooKeeper as the metadata store. It provides scalable, strongly consistent metadata management without the operational complexity of ZooKeeper clusters. The **Iceberg Catalog** tracks table metadata for lakehouse integration.

### Data Layer (Ursa Stream Storage)

Data writes directly to object storage (S3, GCS, or Azure Blob Storage) using a Write-Ahead Log (WAL) implementation. The storage layer provides the flexibility to choose between local disks for low latency and shared storage (lakehouse storage) for cost-efficiency. This design:

* Supports both disk-based and diskless storage modes
* Stores data in open table formats (Iceberg, Delta Lake)
* Makes every stream simultaneously queryable as a lakehouse table

## Lakestream vs Traditional Architectures

The following diagram shows how streaming architecture has evolved from monolithic designs to the Lakestream paradigm:

<img src="https://mintcdn.com/streamnative/CDw7JxphxyMzStpZ/media/lakestream-evolution.png?fit=max&auto=format&n=CDw7JxphxyMzStpZ&q=85&s=54f853a16dfe621e27f1b46fb39d4e38" alt="Streaming Architecture Evolution: From Monolith to Lakestream" width="2186" height="1228" data-path="media/lakestream-evolution.png" />

## Key benefits

<CardGroup cols={2}>
  <Card title="Leaderless and Stateless" icon="circle-nodes">
    No leader elections, no partition rebalancing, no broker disks. Brokers are stateless and interchangeable.
  </Card>

  <Card title="Up to 95% Cost Reduction" icon="piggy-bank">
    Data writes directly to object storage, eliminating expensive cross-AZ replication between brokers.
  </Card>

  <Card title="Stream-Table Duality" icon="table">
    Every event written to a topic simultaneously exists as a row in an Iceberg or Delta Lake table. Zero-copy, no ETL.
  </Card>

  <Card title="Open Formats" icon="lock-open">
    Data stored in Iceberg and Delta Lake on your object storage. Query with any engine. No vendor lock-in.
  </Card>
</CardGroup>

## Services powered by Lakestream

<Note>
  Native Pulsar protocol on the Cost-Optimized profile is coming after the Apache Pulsar 5.0 release. Today, Pulsar Clusters on the Cost-Optimized profile expose the Kafka-compatible protocol. See [Cluster Profiles Overview](/cloud/clusters/cluster-profiles-overview) for the full capability matrix.
</Note>

Lakestream powers both streaming services on StreamNative Cloud:

<CardGroup cols={2}>
  <Card title="Kafka Service" icon="chart-line" href="/kafka/overview">
    Native Apache Kafka API. Ideal for event streaming, log aggregation, CDC, and IoT telemetry.
  </Card>

  <Card title="Pulsar Service" icon="wave-pulse" href="/cloud/overview/cloud-overview">
    Apache Pulsar with multi-protocol support. Ideal for messaging, queuing, and multi-tenant workloads.
  </Card>
</CardGroup>

## Learn more

* [Ursa Engine technical details](/cloud/overview/data-streaming-engine) for storage engine architecture, cluster profiles, and feature comparison
* [Choose Kafka or Pulsar](/cloud/overview/choose-kafka-or-pulsar) to decide which protocol fits your workload
* [Ursa: A Lakehouse-Native Data Streaming Engine for Kafka](https://vldb.org/pvldb/volumes/18/paper/Ursa%3A%20A%20Lakehouse-Native%20Data%20Streaming%20Engine%20for%20Kafka) — VLDB 2025 Best Industry Paper
