BigQuery Connector integrates Apache Pulsar with Google BigQuery.
builtin
connector. If you want to create a non-builtin
connector,
you need to replace --source-type bigquery
with --archive /path/to/pulsar-io-bigquery.nar
. You can find the button to download the nar
package at the beginning of the document.
--source-config
is the minimum necessary configuration for starting this connector, and it is a JSON string. You need to substitute the relevant parameters with your own.
If you want to configure more parameters, see Configuration Properties for reference.
pulsar-admin
are similar to those of pulsarctl
. You can find an example for StreamNative Cloud Doc.AUTO_CONSUMER
to consume the data. For example:
Name | Type | Required | Sensitive | Default | Description | |
---|---|---|---|---|---|---|
projectId | String | Yes | false | "" (empty string) | The Google BigQuery project ID. | |
datasetName | String | Yes | false | "" (empty string) | The Google BigQuery dataset name. | |
tableName | String | Yes | false | "" (empty string) | The Google BigQuery table name. | |
credentialJsonString | String | No | true | "" (empty string) | The authentication JSON key. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key when the credentialJsonString is set to an empty string. For details, see the Google documentation. | |
maxParallelism | int | No | false | 1 | The maximum parallelism for reading. In fact, the number may be less if the BigQuery source connector deems the data small enough. | |
forceUpdate | Boolean | No | false | false | ”if forceUpdate=true,a new session will be created. The connector will transmit the data again. | |
queueSize | int | No | false | 10000 | The buffer queue size of the source. It is used for storing records before they are sent to Pulsar topics. By default, it is set to 10000 . | |
sql | String | No | false | "" (empty string) | The SQL query on BigQuery. The computed result is saved in a temporary table. The temporary table has a configurable expiration time, and the BigQuery source connector automatically deletes the temporary table when the data is transferred completely. The projectId and datasetName gets values from the configuration file, and the tableName is generated by UUID. | |
expirationTimeInMinutes | int | No | false | 1440 | The expiration time in minutes until the table is expired and auto-deleted. | |
selectedFields | String | No | false | "" (empty string) | Names of the fields in the table that should be read. | |
filters | String | No | false | "" (empty string) | A list of clauses that can filter the result of the table. | |
checkpointIntervalSeconds | int | No | false | 60 | The checkpoint interval (in units of seconds). By default, it is set to 60s. |