__key column into the lakehouse table that contains the original message key from the source topic. Use it when downstream queries need access to the partition key produced by the application — for example, joining the lakehouse table with another dataset on the producer’s key, computing per-key aggregations, or auditing message routing.
What Gets Persisted
When the feature is enabled, every record written to the lakehouse table includes a__key column populated with the source message’s key:
| Source | Key value persisted |
|---|---|
| Pulsar producer | The Pulsar message key (MessageBuilder.key(...)), or null if the producer did not set one |
| Kafka producer | The Kafka record key bytes, or null if the producer did not set one |
binary value, preserving the exact bytes the producer sent.
Configuration
| Property | Default | Description |
|---|---|---|
persistKey | false | Master switch. When true, the __key column is added to the table and populated for every record. |
Required Companion Settings
The__key column is added through schema evolution, so schema evolution must remain enabled:
| Property | Default | Description |
|---|---|---|
tableEvolveSchemaEnabled | true | Required (the column is added by schema evolution). Only override if previously disabled. |
persistKey does not require Variant support, Iceberg V3, or any other feature flag.
Configuration
Add the following to the compaction servicecustom config:
Querying the Key
Once enabled, the__key column appears in the lakehouse table alongside the user-defined fields. Examples:
Spark SQL:
STRING for readability. For binary keys, query the column directly as BINARY.
Behavior Notes
- Adding the key column to existing tables. Because the column is added through schema evolution (
tableEvolveSchemaEnabled=true), enabling the flag on a topic that already has a lakehouse table appends the__keycolumn on the next compaction. Records written before the flag was enabled will have anullvalue in the column. - Disabling the feature. Setting
persistKeyback tofalsestops new records from receiving the key, but the column itself is not removed. Older rows retain their values. - Null keys. Messages without a key are written with
nullin the__keycolumn. - Combining with upsert. The
__keycolumn is independent of identifier fields used for upsert. It records the producer-side key as-is and is not used for deduplication.
Related
- Persist Extra Metadata — Companion feature that persists message envelope metadata (offset, publish time, properties) as a Variant column
- Schema Evolution — The mechanism by which the
__keycolumn is added to the table - Upsert — For deduplication semantics on a primary key, which is a different concern from persisting the source message key