sink
HDFS3 Sink
The HDFS3 sink connector pulls the messages from Pulsar topics and persists the messages to HDFS files.
Authored by
ASF
Support type
StreamNative
License
Apache License 2.0

The HDFS3 sink connector pulls the messages from Pulsar topics and persists the messages to HDFS files.

Configuration

The configuration of the HDFS3 sink connector has the following properties.

Property

NameTypeRequiredDefaultDescription
hdfsConfigResourcesStringtrueNoneA file or a comma-separated list containing the Hadoop file system configuration.<br/><br/>Example<br/>'core-site.xml'<br/>'hdfs-site.xml'
directoryStringtrueNoneThe HDFS directory where files read from or written to.
encodingStringfalseNoneThe character encoding for the files.<br/><br/>Example<br/>UTF-8<br/>ASCII
compressionCompressionfalseNoneThe compression code used to compress or de-compress the files on HDFS. <br/><br/>Below are the available options:<br/><li>BZIP2<br/><li>DEFLATE<br/><li>GZIP<br/><li>LZ4<br/><li>SNAPPY
kerberosUserPrincipalStringfalseNoneThe principal account of Kerberos user used for authentication.
keytabStringfalseNoneThe full pathname of the Kerberos keytab file used for authentication.
filenamePrefixStringfalseNoneThe prefix of the files created inside the HDFS directory.<br/><br/>Example<br/> The value of topicA result in files named topicA-.
fileExtensionStringfalseNoneThe extension added to the files written to HDFS.<br/><br/>Example<br/>'.txt'<br/> '.seq'
separatorcharfalseNoneThe character used to separate records in a text file. <br/><br/>If no value is provided, the contents from all records are concatenated together in one continuous byte array.
syncIntervallongfalse0The interval between calls to flush data to HDFS disk in milliseconds.
maxPendingRecordsintfalseInteger.MAX_VALUEThe maximum number of records that hold in memory before acking. <br/><br/>Setting this property to 1 makes every record send to disk before the record is acked.<br/><br/>Setting this property to a higher value allows buffering records before flushing them to disk.

Example

Before using the HDFS3 sink connector, you need to create a configuration file through one of the following methods.

  • JSON

    {
        "hdfsConfigResources": "core-site.xml",
        "directory": "/foo/bar",
        "filenamePrefix": "prefix",
        "compression": "SNAPPY"
    }
    
  • YAML

    configs:
        hdfsConfigResources: "core-site.xml"
        directory: "/foo/bar"
        filenamePrefix: "prefix"
        compression: "SNAPPY"