conf/broker.conf
and conf/offload.conf
files within the Pulsar broker environment.
In conf/broker.conf
, include the following configurations:
Configuration | Description | Required | Default Value |
---|---|---|---|
managedLedgerOffloadDriver | Case-insensitive offloader driver name (e.g., delta or iceberg ) | Yes | N/A |
conf/offload.conf
, add the following configurations:
Configuration | Description | Required | Default Value |
---|---|---|---|
metadataServiceUri | Metadata service URI for BookKeeper client (e.g., zk://localhost:2181/ledgers ) | Yes | N/A |
pulsarWebServiceUrl | Pulsar web service URL (e.g., http://localhost:8080 ) | Yes | N/A |
pulsarServiceUrl | Pulsar protocol service URL (e.g., pulsar://localhost:6650 ) | Yes | N/A |
offloadProvider | Offloader driver’s name (e.g., delta or iceberg ) | Yes | N/A |
storagePath | Storage path (e.g., s3a://bucket-name/prefix or gs://bucket-name/prefix ) | No | data |
googleCloudProjectID | GCS offload project configuration. For example: example-project | Required if offloading data to GCS | N/A |
googleCloudServiceAccountFile | GCS offload authentication. For example: /Users/user-name/Downloads/project-804d5e6a6f33.json | Required if offloading data to GCS | N/A |
AWS_ACCESS_KEY_ID
and AWS_SECRET_KEY
before starting up the broker service and setting the storagePath
with s3s://
prefix.
Lakehouse product like Delta Lake is supported, with data typically written in parquet format with snappy compression. Specify the offload provider (delta
or iceberg
) in conf/offload.conf
to choose the Lakehouse product.
Upon completing these configurations, start the Pulsar broker to initiate the Lakehouse tiered storage offload service.
After finished the above steps, the Lakehouse Storage feature will be support on your Pulsar cluster. But this feature is still not enabled by default on the Pulsar cluster, you need to enable it on namespace level by setting namespace offload threshold.
For example:
Set the offload threshold to 0 by pulsar-admin, which means all the data will be offloaded to Lakehouse immediately.
pulsar perf
to produce messages to a topic:
pulsar-admin
:
__OFFLOAD
cursor if the offload process started
offloaded
flag, if the ledger has been offloaded to Lakehouse, the offloaded
flag will be set to true
.
Note: The topic offload processor is triggered by ledger rollover, after the offload process triggered, it will offload the following ledgers in streaming mode and do not need to wait for ledger rollover. So when you produce messages to the topic and the first ledger not rolled over, the offload process will not start.
pulsar perf
:
pulsar-admin
: