- Metadata (schema, manifest lists, snapshots)
- Data files (Parquet/columnar format)
- Transaction logs (for table evolution)
- Apache Iceberg
- Delta Lake (Delta 2.0 and above)
- Ingested from the streaming topic
- Serialized into Parquet files
- Committed into the table as immutable snapshots
- Made available for SQL queries and analytical engines
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- Schema inference from topics
- Backward/forward compatible evolution
- Safe writes with schema enforcement
- Automatic mapping to Iceberg/Delta schemas
- Users retain full control over table evolution policies.
- ACID transactions
- Snapshot isolation
- Time travel (via historical snapshots)
- Incremental reads
- Full Interoperability with Data and AI Platforms
- Databricks
- Snowflake
- BigQuery Managed Tables
- Apache Spark, Flink, and Trino
- StarTree and Pinot
- DuckDB
- pandas & PyArrow