> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Persist Extra Metadata

The **Persist Extra Metadata** feature injects a `__meta` column into the lakehouse table that contains additional message metadata captured at write time. Use it when downstream queries need access to information that is part of the message envelope but not part of the message body -- for example, the message offset, publish time, event time, or producer-side properties.

## What Gets Persisted

When the feature is enabled, every record written to the lakehouse table includes a `__meta` column populated with the following fields:

| Field             | Description                                                                        |
| ----------------- | ---------------------------------------------------------------------------------- |
| `__messageOffset` | The original Pulsar / Kafka message ID or offset                                   |
| `__publishTime`   | Pulsar publish timestamp (epoch ms)                                                |
| `__eventTime`     | Pulsar event time (epoch ms), if set by the producer                               |
| `__schemaVersion` | Numeric schema version of the source message                                       |
| `__properties`    | Producer-supplied key/value properties (Pulsar message properties / Kafka headers) |

The column is stored as an Iceberg / Delta **Variant** value. This keeps the schema stable when properties evolve (new keys, removed keys) and lets downstream engines extract individual fields with standard Variant accessors (for example, `__meta.__publishTime` in Spark SQL).

## Configuration

| Property               | Default | Description                                                                                           |
| ---------------------- | ------- | ----------------------------------------------------------------------------------------------------- |
| `persistExtraMetadata` | `false` | Master switch. When `true`, the `__meta` column is added to the table and populated for every record. |

### Required Companion Settings

Because `__meta` is a Variant column, the feature shares the [Variant type](/private-cloud/v2/configure-private-cloud/private-preview/ursa-lakehouse/features/variant-type) prerequisites:

| Property                   | Default | Required When                                                               |
| -------------------------- | ------- | --------------------------------------------------------------------------- |
| `variantTypeEnabled`       | `false` | Always (master switch for Variant support)                                  |
| `tableEvolveSchemaEnabled` | `true`  | Always (the column is added by schema evolution)                            |
| `allowIcebergV3`           | `false` | Required for Iceberg (Variant is an Iceberg V3 feature). Ignored for Delta. |

> **Important:** Variant support is gated by a feature flag. Contact the StreamNative Support Team to enable it before turning on `persistExtraMetadata`.

### Iceberg

Add the following to the compaction service `custom` config:

```yaml theme={null}
persistExtraMetadata: "true"
variantTypeEnabled: "true"
tableEvolveSchemaEnabled: "true"   # default; only override if previously disabled
allowIcebergV3: "true"
```

> **Downstream query engine compatibility:** When `allowIcebergV3` is enabled, your readers (Spark, Trino, Athena, etc.) must support Iceberg V3 to read tables that contain Variant columns. See [Variant Type](/private-cloud/v2/configure-private-cloud/private-preview/ursa-lakehouse/features/variant-type#iceberg) for details.

### Delta Lake

```yaml theme={null}
persistExtraMetadata: "true"
variantTypeEnabled: "true"
tableEvolveSchemaEnabled: "true"
```

Delta does not require `allowIcebergV3`.

## Querying the Metadata

Once enabled, the `__meta` column appears in the lakehouse table alongside the user-defined fields. Examples:

**Spark SQL (Iceberg):**

```sql theme={null}
SELECT __meta:__publishTime AS publish_time,
       __meta:__messageOffset AS offset,
       __meta:__properties:trace_id AS trace_id,
       *
FROM iceberg_catalog.namespace.events
WHERE __meta:__publishTime >= 1700000000000
LIMIT 10;
```

**Spark SQL (Delta):**

```sql theme={null}
SELECT variant_get(__meta, '$.__publishTime', 'long') AS publish_time,
       variant_get(__meta, '$.__messageOffset', 'string') AS offset,
       *
FROM delta.`s3://bucket/path/events`
LIMIT 10;
```

The exact Variant accessor syntax depends on your engine version; consult the engine's Variant documentation for the canonical form.

## Behavior Notes

* **Adding metadata to existing tables.** Because the column is added through schema evolution (`tableEvolveSchemaEnabled=true`), enabling the flag on a topic that already has a lakehouse table appends the `__meta` column on the next compaction. Records written before the flag was enabled will have a `null` value in the column.
* **Disabling the feature.** Setting `persistExtraMetadata` back to `false` stops new records from receiving metadata, but the column itself is not removed. Older rows retain their values.
* **Performance impact.** The `__meta` column is small per-row but increases storage and write throughput slightly. The Variant encoding is efficient and supports predicate pushdown when the engine extracts a specific field.

## Related

* [Variant Type](/private-cloud/v2/configure-private-cloud/private-preview/ursa-lakehouse/features/variant-type) -- Prerequisite feature for `persistExtraMetadata`
* [Schema Evolution](/private-cloud/v2/configure-private-cloud/private-preview/ursa-lakehouse/features/schema-evolution) -- The mechanism by which the `__meta` column is added to the table
* [Persist Key](/private-cloud/v2/configure-private-cloud/private-preview/ursa-lakehouse/features/persist-key) -- Companion feature that persists the message key as a separate column
