Partition keys control how data is organized into partitions within the Iceberg table. Partitioning improves query performance by enabling partition pruning.Documentation Index
Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
Use this file to discover all available pages before exploring further.
partition.key is a dynamic configuration key that takes effect only at the topic level. Setting it at the cluster or namespace level has no effect.
Cluster-name prefix: All dynamic configuration keys must be prefixed with the cluster name (for example,<cluster-name>.partition.key). The cluster name is the value ofclusterNameinconf/broker.conf— see Finding the Cluster Name. The examples below useprivate-cloudas the cluster name; replace it with the name of your cluster.
Cardinality limit: Keep the total number of partition values across all levels under 10 (the cardinality ofkey1 × key2 × ... × keyNshould not exceed 10). If the partitioning would produce more than 10 distinct partition values, use thebucket[N]transform to bound it. Excessive partitions cause many small files, which degrade write throughput and query performance.
Configuration Format
The partition key is specified as a JSON array. Each element has three fields:| Field | Required | Description |
|---|---|---|
sourceColumn | Yes | The field name from the topic schema |
transform | No | Iceberg partition transform function. Defaults to identity. |
targetName | No | Custom name for the transformed partition column |
Supported Iceberg Transforms
| Transform | Description |
|---|---|
identity | Use the field value as-is |
bucket[N] | Hash into N buckets |
truncate[N] | Truncate strings to N characters |
year | Extract year from timestamp |
month | Extract month from timestamp |
day | Extract day from timestamp |
hour | Extract hour from timestamp |
Apply at Topic Level
Settingpartition.keyat the cluster (sn/system) or namespace level has no effect. Apply it on the topic only.
Example
Configure two partition keys on a topic:timestamp— bucketed by hour, namedts_houraddress— truncated to 7 characters, namedt_address
Important Notes
- The
sourceColumnvalue must reference a field that exists in the topic schema. - The
targetNameis the name produced after applying the transform; it does not need to exist in the topic schema. - The JSON value must be a valid JSON array. When passing it on the shell, escape inner double quotes (
\"). - If the JSON cannot be parsed, the system falls back to a non-partitioned table.
- Keep the total cardinality of partition values under 10. If a column has high cardinality, wrap it with
bucket[N]to bound the number of partitions (for example,{"sourceColumn":"userId","transform":"bucket[8]"}). High partition counts produce many small files and degrade performance.
Related
- Dynamic Configuration Guide — Cluster-name prefix, override priority, and apply procedure
- Upsert — Combining partition keys with upsert has compatibility constraints