Kinesis source
The Kinesis source connector pulls data from Amazon Kinesis and persists data into Pulsar
The Kinesis source connector pulls data from Amazon Kinesis and persists data into Pulsar. For more information about connectors, see Connector Overview.
This connector uses the Kinesis Consumer Library (KCL) to consume messages. The KCL uses DynamoDB to track checkpoints for consumers, and uses CloudWatch to track metrics for consumers.
This document introduces how to get started with creating an AWS Kinesis source connector and get it up and running.
Currently, the Kinesis source connector only supports raw messages. If you use AWS Key Management Service (KMS) encrypted messages, the encrypted messages are sent to Pulsar directly. You need to manually decrypt the data on the consumer side of Pulsar.
Quick start
Prerequisites
The prerequisites for connecting an AWS Kinesis source connector to external systems include:
- Create a Kinesis data stream in AWS.
- Create an AWS User and an
AccessKey
(Please record the value ofAccessKey
and itsSecretKey
). - Assign the following permissions to the AWS User:
- AmazonKinesisFullAccess
- CloudWatch:PutMetricData: it is required because AWS Kinesis client will periodically send metrics to CloudWatch.
- AmazonDynamoDBFullAccess: it is required because AWS Kinesis client will use DynamoDB store checkpoint status.
1. Create a connector
The following command shows how to use pulsarctl to create a builtin
connector. If you want to create a non-builtin
connector,
you need to replace --source-type kinesis
with --archive /path/to/pulsar-io-kinesis.nar
. You can find the button to download the nar
package at the beginning of the document.
If you are a StreamNative Cloud user, you need set up your environment first.
The --source-config
is the minimum necessary configuration for starting this connector, and it is a JSON string. You need to substitute the relevant parameters with your own.
If you want to configure more parameters, see Configuration Properties for reference.
You can also choose to use a variety of other tools to create a connector:
- pulsar-admin: The command arguments for
pulsar-admin
are similar to those ofpulsarctl
. You can find an example for StreamNative Cloud Doc. - RestAPI: You can find an example for StreamNative Cloud Doc.
- Terraform: You can find an example for StreamNative Cloud Doc.
- Function Mesh: The docker image can be found at the beginning of the document.
2. Send messages to Kinesis
The following example uses KPL to send data to Kinesis. For more details, see Writing to your Kinesis Data Stream Using the KPL
3. Show data using Pulsar client
If your connector is created on StreamNative Cloud, you need to authenticate your clients. See Build applications using Pulsar clients for more information.
Configuration Properties
This table outlines the properties of an AWS Kinesis source connector.
Name | Type | Required | Sensitive | Default | Description |
---|---|---|---|---|---|
awsKinesisStreamName | String | true | false | ” ” (empty string) | The Kinesis stream name. |
awsRegion | String | false | false | ” ” (empty string) | The AWS region. Example us-west-1, us-west-2. |
awsCredentialPluginName | String | false | false | ” ” (empty string) | The fully-qualified class name of implementation of AwsCredentialProviderPlugin. For more information, see [Configure AwsCredentialProviderPlugin](###Configure AwsCredentialProviderPlugin). |
awsCredentialPluginParam | String | false | true | ” ” (empty string) | The JSON parameter to initialize awsCredentialsProviderPlugin . For more information, see [Configure AwsCredentialProviderPlugin](###Configure AwsCredentialProviderPlugin). |
awsEndpoint | String | false | false | ” ” (empty string) | The Kinesis end-point URL, which can be found at here. |
dynamoEndpoint | String | false | false | ” ” (empty string) | The Dynamo end-point URL, which can be found at here. |
cloudwatchEndpoint | String | false | false | ” ” (empty string) | The Cloudwatch end-point URL. For more information, seeAmazon documentation. |
applicationName | String | false | false | Pulsar IO connector | The name of the Amazon Kinesis application, which will be used as the table name for DynamoDB. |
initialPositionInStream | InitialPositionInStream | false | false | LATEST | The position where the connector starts from. Below are the available options: AT_TIMESTAMP : start from the record at or after the specified timestamp.LATEST : start after the most recent data record.TRIM_HORIZON : start from the oldest available data record. |
startAtTime | Date | false | false | ” ” (empty string) | If set to AT_TIMESTAMP , it specifies the time point to start consumption. |
checkpointInterval | Long | false | false | 60000 | The frequency of the Kinesis stream checkpoint in milliseconds. |
backoffTime | Long | false | false | 3000 | The amount of time to delay between requests when the connector encounters a throttling exception from AWS Kinesis in milliseconds. |
numRetries | int | false | false | 3 | The number of re-attempts when the connector encounters an exception while trying to set a checkpoint. |
receiveQueueSize | int | false | false | 1000 | The maximum number of AWS records that can be buffered inside the connector. Once the receiveQueueSize is reached, the connector does not consume any messages from Kinesis until some messages in the queue are successfully consumed. |
useEnhancedFanOut | boolean | false | false | true | If set to true, it uses Kinesis enhanced fan-out. If set to false, it uses polling. |
Configure AwsCredentialProviderPlugin
AWS Kinesis source connector allows you to use three ways to connect to AWS Kinesis by configuring awsCredentialPluginName
.
-
Leave
awsCredentialPluginName
empty to get the connector authenticated by passingaccessKey
andsecretKey
inawsCredentialPluginParam
. -
Set
awsCredentialPluginName
toorg.apache.pulsar.io.aws.AwsDefaultProviderChainPlugin
to use the default AWS provider chain. With this option, you don’t need to configureawsCredentialPluginParam
. For more information, see AWS documentation. -
Set
awsCredentialPluginName
toorg.apache.pulsar.io.aws.STSAssumeRoleProviderPlugin
to use the default AWS provider chain, and you need to configureroleArn
androleSessionNmae
inawsCredentialPluginParam
. For more information, see AWS documentation