Upsert mode enables deduplication of records based on a primary key. When multiple records with the same key arrive, only the latest value is retained in the lakehouse table.Documentation Index
Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
Use this file to discover all available pages before exploring further.
upsert.mode.enabled and identifier.fields are dynamic configuration keys.
Cluster-name prefix: All dynamic configuration keys must be prefixed with the cluster name (for example,<cluster-name>.upsert.mode.enabled). The cluster name is the value ofclusterNameinconf/broker.conf— see Finding the Cluster Name. The examples below useprivate-cloudas the cluster name; replace it with the name of your cluster.
Configuration Keys
| Key | Scope | Description | Default |
|---|---|---|---|
<cluster-name>.upsert.mode.enabled | Cluster / Namespace / Topic | Enable upsert mode (true / false). More-specific scopes override broader ones. | false |
<cluster-name>.identifier.fields | Topic only | Comma-separated list of fields used as the primary key. Setting this at the cluster or namespace level has no effect. | — |
How Upsert Works
StreamNative Ursa implements upsert using Iceberg equal-delete files. For each upsert write, the compaction service emits an equal-delete entry that matches the previous record by the configured identifier fields, followed by the new record. Readers reconcile the delete and the new value at query time.Catalog Compatibility
Because upsert depends on equal-delete files, the underlying catalog and query engine must support reading Iceberg equal deletes. Catalogs differ in their delete-file support:| Catalog | Equal Delete | Upsert Support |
|---|---|---|
| Snowflake Open Catalog (Polaris) | No (position delete only) | Not supported |
| Snowflake Horizon Catalog | No (position delete only) | Not supported |
| Databricks Unity Catalog (Managed Iceberg) | No (position delete only) | Not supported |
| AWS S3Table | Yes | Supported |
| Google BigLake | Under confirmation | Under confirmation |
| Iceberg Hadoop catalog (no external catalog) | Yes | Supported |
Apply at Cluster Level
Cluster-level properties are stored in thesn/system namespace. Cluster-level upsert applies as the default for every namespace and topic that does not override it.
Apply at Namespace Level
Apply at Topic Level
identifier.fields must be set on the topic; setting upsert.mode.enabled on the topic is also valid and overrides any namespace or cluster default.
Example
Enable upsert onpersistent://public/default/users with userId and email as the primary key:
Requirements
- Identifier fields must exist in the topic schema.
- Identifier fields must be marked as
requiredin the schema (not nullable). - The catalog and query engine must support reading Iceberg equal-delete files. See Catalog Compatibility.
Commit Behavior
- Append-only writes (default) are batch-committed: multiple Parquet files are grouped and committed to the catalog in a single commit.
- Upsert writes are committed one-by-one to ensure correct data ordering. This may slightly reduce throughput compared to append-only mode.
Limitations
- When upsert is combined with a partition key, identifier fields only deduplicate within the same partition. For example, if the table has
partition.key=regionandidentifier.fields=userId, two records with the sameuserIdbut differentregionvalues are both kept. - Upsert is supported only for External Tables (SDT).
Related
- Dynamic Configuration Guide — Cluster-name prefix, override priority, and apply procedure
- Partition Key — Combining partition keys with upsert has compatibility constraints