> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Persist Key

The **Persist Key** feature injects a `__key` column into the lakehouse table that contains the original message key from the source topic. Use it when downstream queries need access to the partition key produced by the application -- for example, joining the lakehouse table with another dataset on the producer's key, computing per-key aggregations, or auditing message routing.

## What Gets Persisted

When the feature is enabled, every record written to the lakehouse table includes a `__key` column populated with the source message's key:

| Source          | Key value persisted                                                                           |
| --------------- | --------------------------------------------------------------------------------------------- |
| Pulsar producer | The Pulsar message key (`MessageBuilder.key(...)`), or `null` if the producer did not set one |
| Kafka producer  | The Kafka record key bytes, or `null` if the producer did not set one                         |

The column is added to the table by schema evolution and stored as a `binary` value, preserving the exact bytes the producer sent.

## Configuration

| Property     | Default | Description                                                                                          |
| ------------ | ------- | ---------------------------------------------------------------------------------------------------- |
| `persistKey` | `false` | Master switch. When `true`, the `__key` column is added to the table and populated for every record. |

### Required Companion Settings

The `__key` column is added through schema evolution, so schema evolution must remain enabled:

| Property                   | Default | Description                                                                               |
| -------------------------- | ------- | ----------------------------------------------------------------------------------------- |
| `tableEvolveSchemaEnabled` | `true`  | Required (the column is added by schema evolution). Only override if previously disabled. |

`persistKey` does **not** require Variant support, Iceberg V3, or any other feature flag.

### Configuration

Add the following to the compaction service `custom` config:

```yaml theme={null}
persistKey: "true"
tableEvolveSchemaEnabled: "true"   # default; only override if previously disabled
```

The same configuration applies to both Iceberg and Delta Lake.

## Querying the Key

Once enabled, the `__key` column appears in the lakehouse table alongside the user-defined fields. Examples:

**Spark SQL:**

```sql theme={null}
SELECT CAST(__key AS STRING) AS key,
       *
FROM iceberg_catalog.namespace.events
WHERE CAST(__key AS STRING) = 'user-42'
LIMIT 10;
```

```sql theme={null}
-- Per-key counts
SELECT CAST(__key AS STRING) AS key, COUNT(*) AS message_count
FROM iceberg_catalog.namespace.events
GROUP BY CAST(__key AS STRING)
ORDER BY message_count DESC;
```

If your producer keys are UTF-8 strings, cast the column to `STRING` for readability. For binary keys, query the column directly as `BINARY`.

## Behavior Notes

* **Adding the key column to existing tables.** Because the column is added through schema evolution (`tableEvolveSchemaEnabled=true`), enabling the flag on a topic that already has a lakehouse table appends the `__key` column on the next compaction. Records written before the flag was enabled will have a `null` value in the column.
* **Disabling the feature.** Setting `persistKey` back to `false` stops new records from receiving the key, but the column itself is not removed. Older rows retain their values.
* **Null keys.** Messages without a key are written with `null` in the `__key` column.
* **Combining with upsert.** The `__key` column is independent of [identifier fields](/cloud/lakehouse/features/iceberg-upsert) used for upsert. It records the producer-side key as-is and is not used for deduplication.

## Related

* [Persist Extra Metadata](/cloud/lakehouse/features/persist-extra-metadata) -- Companion feature that persists message envelope metadata (offset, publish time, properties) as a Variant column
* [Schema Evolution](/cloud/lakehouse/features/schema-evolution) -- The mechanism by which the `__key` column is added to the table
* [Upsert](/cloud/lakehouse/features/iceberg-upsert) -- For deduplication semantics on a primary key, which is a different concern from persisting the source message key
