1. Configure Private Cloud
  2. Private Preview
  3. Lakehouse Tiered Storage

Lakehouse tiered storage Overview

Note

This feature is currently in private preview. If you want to try it out or have any questions, submit a ticket to the support team.

Introduction

Lakehouse Tiered Storage transforms Apache Pulsar into a Lakehouse, enabling seamless integration with leading Lakehouse products like Delta Lake. This feature facilitates streaming offload of topic data to Lakehouse products, supporting both streaming and batch reads. The architecture of this integration enhances data storage and processing capabilities within the Pulsar ecosystem.

Lakehouse Tiered Storage

Collaboration with Pulsar and Lakehouse Ecosystems

Lakehouse Tiered Storage harmonizes with Apache Pulsar and the Lakehouse ecosystems, bridging the gap between real-time data processing in Pulsar and batch processing in Lakehouse products.

Lakehouse Tiered Storage Ecosystem

Architecture Overview

The architecture of Lakehouse Tiered Storage is designed to optimize data offloading and retrieval processes, ensuring efficient data management within the integrated Pulsar-Lakehouse environment.

Lakehouse Tiered Storage Architecture

Key Capabilities of Lakehouse Tiered Storage

Data Offloading to Lakehouse

Lakehouse Tiered Storage facilitates the seamless offloading of data from Pulsar topics to Lakehouse products in streaming mode, supporting open formats like Delta Lake.

Streaming Read from Lakehouse

Users can efficiently perform streaming reads from tables stored in Lakehouse products using Pulsar clients, ensuring real-time data accessibility.

Batch Read from Lakehouse

For batch processing requirements, Lakehouse Tiered Storage supports batch reads from Lakehouse products using popular query engines like Spark SQL, Flink SQL, and Trino.

Benefits of Lakehouse Tiered Storage

  • Long-Term Data Retention: Define offload policies to store data in BookKeeper for real-time processing and in Lakehouse products for batch processing, ensuring comprehensive data retention strategies.
  • Cost-Effective Storage: Utilize Lakehouse products for storing cold data with open formats and compression, offering a cost-effective storage solution.
  • Unified Data Platform: Pulsar serves as a unified data storage and processing platform for real-time and batch data processing needs, enhancing operational efficiency.
  • Schema Evolution Management: Lakehouse Tiered Storage seamlessly handles schema evolution, ensuring synchronization between Pulsar topics and Lakehouse tables.
  • Data Query and Analysis: Enable data querying in Lakehouse products and utilize Pulsar consumers/readers to access data from BookKeeper and Lakehouse products.
  • Advanced Data Management Features: Benefit from data versioning, auditing, indexing, caching, and query optimization capabilities, merging the advantages of data lakes and data warehouses.

Next Steps

Previous
Log Format