> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Upsert

Upsert mode enables deduplication of records based on a primary key. When multiple records with the same key arrive, only the latest value is retained in the lakehouse table.

`upsert.mode.enabled` and `identifier.fields` are [dynamic configuration](../dynamic-configuration) keys.

> **Cluster-name prefix:** All dynamic configuration keys must be prefixed with the cluster name (for example, `<cluster-name>.upsert.mode.enabled`). The cluster name is the value of `clusterName` in `conf/broker.conf` -- see [Finding the Cluster Name](../dynamic-configuration#finding-the-cluster-name). The examples below use `private-cloud` as the cluster name; replace it with the name of your cluster.

## Configuration Keys

| Key                                  | Scope                       | Description                                                                                                           | Default |
| ------------------------------------ | --------------------------- | --------------------------------------------------------------------------------------------------------------------- | ------- |
| `<cluster-name>.upsert.mode.enabled` | Cluster / Namespace / Topic | Enable upsert mode (`true` / `false`). More-specific scopes override broader ones.                                    | `false` |
| `<cluster-name>.identifier.fields`   | Topic only                  | Comma-separated list of fields used as the primary key. Setting this at the cluster or namespace level has no effect. | --      |

## How Upsert Works

StreamNative Ursa implements upsert using **Iceberg equal-delete** files. For each upsert write, the compaction service emits an equal-delete entry that matches the previous record by the configured identifier fields, followed by the new record. Readers reconcile the delete and the new value at query time.

### Catalog Compatibility

Because upsert depends on equal-delete files, the underlying catalog and query engine must support reading Iceberg equal deletes. Catalogs differ in their delete-file support:

| Catalog                                      | Equal Delete              | Upsert Support     |
| -------------------------------------------- | ------------------------- | ------------------ |
| Snowflake Open Catalog (Polaris)             | No (position delete only) | Not supported      |
| Snowflake Horizon Catalog                    | No (position delete only) | Not supported      |
| Databricks Unity Catalog (Managed Iceberg)   | No (position delete only) | Not supported      |
| AWS S3Table                                  | Yes                       | Supported          |
| Google BigLake                               | Under confirmation        | Under confirmation |
| Iceberg Hadoop catalog (no external catalog) | Yes                       | Supported          |

If you need upsert with a catalog that only supports position deletes, the recommended approach is to use the Hadoop catalog or AWS S3Table for the affected tables.

## Apply at Cluster Level

Cluster-level properties are stored in the `sn/system` namespace. Cluster-level upsert applies as the default for every namespace and topic that does not override it.

```bash theme={null}
bin/pulsar-admin namespaces set-properties \
  -p private-cloud.cluster.upsert.mode.enabled=true \
  sn/system
```

## Apply at Namespace Level

```bash theme={null}
bin/pulsar-admin namespaces set-properties \
  -p private-cloud.upsert.mode.enabled=true \
  <tenant>/<namespace>
```

## Apply at Topic Level

`identifier.fields` must be set on the topic; setting `upsert.mode.enabled` on the topic is also valid and overrides any namespace or cluster default.

```bash theme={null}
bin/pulsar-admin topics update-properties \
  -p private-cloud.upsert.mode.enabled=true \
  -p private-cloud.identifier.fields=<field1>,<field2> \
  persistent://<tenant>/<namespace>/<topic>
```

## Example

Enable upsert on `persistent://public/default/users` with `userId` and `email` as the primary key:

```bash theme={null}
bin/pulsar-admin topics update-properties \
  -p private-cloud.upsert.mode.enabled=true \
  -p private-cloud.identifier.fields=userId,email \
  persistent://public/default/users
```

## Requirements

* Identifier fields **must exist** in the topic schema.
* Identifier fields **must be marked as `required`** in the schema (not nullable).
* The catalog and query engine must support reading Iceberg equal-delete files. See [Catalog Compatibility](#catalog-compatibility).

## Commit Behavior

* **Append-only writes** (default) are batch-committed: multiple Parquet files are grouped and committed to the catalog in a single commit.
* **Upsert writes** are committed one-by-one to ensure correct data ordering. This may slightly reduce throughput compared to append-only mode.

## Limitations

* When upsert is combined with a [partition key](/cloud/lakehouse/features/iceberg-partition-key), identifier fields only deduplicate within the same partition. For example, if the table has `partition.key=region` and `identifier.fields=userId`, two records with the same `userId` but different `region` values are both kept.
* Upsert is supported only for External Tables (SDT).

## Related

* [Dynamic Configuration Guide](../dynamic-configuration) -- Cluster-name prefix, override priority, and apply procedure
* [Partition Key](/cloud/lakehouse/features/iceberg-partition-key) -- Combining partition keys with upsert has compatibility constraints
