Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt

Use this file to discover all available pages before exploring further.

Partition keys control how data is organized into partitions within the Iceberg table. Partitioning improves query performance by enabling partition pruning. partition.key is a dynamic configuration key that takes effect only at the topic level. Setting it at the cluster or namespace level has no effect.
Cluster-name prefix: All dynamic configuration keys must be prefixed with the cluster name (for example, <cluster-name>.partition.key). The cluster name is the value of clusterName in conf/broker.conf — see Finding the Cluster Name. The examples below use private-cloud as the cluster name; replace it with the name of your cluster.
Cardinality limit: Keep the total number of partition values across all levels under 10 (the cardinality of key1 × key2 × ... × keyN should not exceed 10). If the partitioning would produce more than 10 distinct partition values, use the bucket[N] transform to bound it. Excessive partitions cause many small files, which degrade write throughput and query performance.

Configuration Format

The partition key is specified as a JSON array. Each element has three fields:
FieldRequiredDescription
sourceColumnYesThe field name from the topic schema
transformNoIceberg partition transform function. Defaults to identity.
targetNameNoCustom name for the transformed partition column

Supported Iceberg Transforms

TransformDescription
identityUse the field value as-is
bucket[N]Hash into N buckets
truncate[N]Truncate strings to N characters
yearExtract year from timestamp
monthExtract month from timestamp
dayExtract day from timestamp
hourExtract hour from timestamp
For full semantics, see the Iceberg partition transforms specification.

Apply at Topic Level

bin/pulsar-admin topics update-properties \
  -p private-cloud.partition.key='[{"sourceColumn":"<field>","transform":"<transform>","targetName":"<name>"}]' \
  persistent://<tenant>/<namespace>/<topic>
Setting partition.key at the cluster (sn/system) or namespace level has no effect. Apply it on the topic only.

Example

Configure two partition keys on a topic:
  • timestamp — bucketed by hour, named ts_hour
  • address — truncated to 7 characters, named t_address
bin/pulsar-admin topics update-properties \
  -p private-cloud.partition.key="[{\"sourceColumn\":\"timestamp\",\"transform\":\"hour\",\"targetName\":\"ts_hour\"},{\"sourceColumn\":\"address\",\"transform\":\"truncate[7]\",\"targetName\":\"t_address\"}]" \
  persistent://public/default/events

Important Notes

  1. The sourceColumn value must reference a field that exists in the topic schema.
  2. The targetName is the name produced after applying the transform; it does not need to exist in the topic schema.
  3. The JSON value must be a valid JSON array. When passing it on the shell, escape inner double quotes (\").
  4. If the JSON cannot be parsed, the system falls back to a non-partitioned table.
  5. Keep the total cardinality of partition values under 10. If a column has high cardinality, wrap it with bucket[N] to bound the number of partitions (for example, {"sourceColumn":"userId","transform":"bucket[8]"}). High partition counts produce many small files and degrade performance.
  • Dynamic Configuration Guide — Cluster-name prefix, override priority, and apply procedure
  • Upsert — Combining partition keys with upsert has compatibility constraints