Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt

Use this file to discover all available pages before exploring further.

The Persist Key feature injects a __key column into the lakehouse table that contains the original message key from the source topic. Use it when downstream queries need access to the partition key produced by the application — for example, joining the lakehouse table with another dataset on the producer’s key, computing per-key aggregations, or auditing message routing.

What Gets Persisted

When the feature is enabled, every record written to the lakehouse table includes a __key column populated with the source message’s key:
SourceKey value persisted
Pulsar producerThe Pulsar message key (MessageBuilder.key(...)), or null if the producer did not set one
Kafka producerThe Kafka record key bytes, or null if the producer did not set one
The column is added to the table by schema evolution and stored as a binary value, preserving the exact bytes the producer sent.

Configuration

PropertyDefaultDescription
persistKeyfalseMaster switch. When true, the __key column is added to the table and populated for every record.

Required Companion Settings

The __key column is added through schema evolution, so schema evolution must remain enabled:
PropertyDefaultDescription
tableEvolveSchemaEnabledtrueRequired (the column is added by schema evolution). Only override if previously disabled.
persistKey does not require Variant support, Iceberg V3, or any other feature flag.

Configuration

Add the following to the compaction service custom config:
persistKey: "true"
tableEvolveSchemaEnabled: "true"   # default; only override if previously disabled
The same configuration applies to both Iceberg and Delta Lake.

Querying the Key

Once enabled, the __key column appears in the lakehouse table alongside the user-defined fields. Examples: Spark SQL:
SELECT CAST(__key AS STRING) AS key,
       *
FROM iceberg_catalog.namespace.events
WHERE CAST(__key AS STRING) = 'user-42'
LIMIT 10;
-- Per-key counts
SELECT CAST(__key AS STRING) AS key, COUNT(*) AS message_count
FROM iceberg_catalog.namespace.events
GROUP BY CAST(__key AS STRING)
ORDER BY message_count DESC;
If your producer keys are UTF-8 strings, cast the column to STRING for readability. For binary keys, query the column directly as BINARY.

Behavior Notes

  • Adding the key column to existing tables. Because the column is added through schema evolution (tableEvolveSchemaEnabled=true), enabling the flag on a topic that already has a lakehouse table appends the __key column on the next compaction. Records written before the flag was enabled will have a null value in the column.
  • Disabling the feature. Setting persistKey back to false stops new records from receiving the key, but the column itself is not removed. Older rows retain their values.
  • Null keys. Messages without a key are written with null in the __key column.
  • Combining with upsert. The __key column is independent of identifier fields used for upsert. It records the producer-side key as-is and is not used for deduplication.
  • Persist Extra Metadata — Companion feature that persists message envelope metadata (offset, publish time, properties) as a Variant column
  • Schema Evolution — The mechanism by which the __key column is added to the table
  • Upsert — For deduplication semantics on a primary key, which is a different concern from persisting the source message key