Skip to main content

CLAUDE.md - StreamNative Connect Documentation

This file provides guidance to Claude Code (claude.ai/code) when working with StreamNative connector documentation.

Purpose

The connect/ directory contains documentation for data connectors that enable integration between StreamNative/Pulsar and external systems, supporting both Pulsar IO and Kafka Connect frameworks.

Directory Structure

  • connectors/: Individual connector documentation
    • Each connector has its own directory (e.g., google-bigquery-sink/, snowflake-sink/)
    • Contains /current/ subdirectory for latest version
  • overview.mdx: General connector concepts and architecture

Source Code Repository Mappings

StreamNative Connectors

  • pulsar-io-bigquery (source_code_refs/pulsar-io-bigquery/): Google BigQuery connector
    • Sink connector for writing to BigQuery
    • Source connector for reading from BigQuery
    • Configuration in conf/pulsar-io-bigquery.yaml
    • Documentation in docs/ directory
  • pulsar-io-snowflake-streaming (source_code_refs/pulsar-io-snowflake-streaming/): Snowflake streaming connector
    • High-performance sink for Snowflake data warehouse
    • Uses Snowflake Streaming API
    • Configuration examples in conf/ directory

Kafka Connect Support

  • ksn (source_code_refs/ksn/): Kafka Connect runtime support
    • Implements Kafka Connect API compatibility
    • Enables running Kafka Connect connectors on Pulsar
    • Key directories:
      • kafka-connect/: Connect runtime implementation
      • docs/: Architecture and configuration guides
  • sn-operator (source_code_refs/sn-operator/): Kafka Connect deployment
    • kafkaconnect_controller.go: Manages Kafka Connect clusters
    • Handles connector lifecycle in Kubernetes

Documentation Patterns

Connector Documentation Structure

Each connector typically includes:
  1. Overview: What the connector does and use cases
  2. Prerequisites: Required accounts, permissions, dependencies
  3. Installation: How to deploy the connector
  4. Configuration: Detailed parameter reference
  5. Usage Examples: Step-by-step guides
  6. Schema Support: Data format and schema evolution
  7. Monitoring: Metrics and health checks
  8. Troubleshooting: Common issues and solutions

Configuration Documentation

  • Required Parameters: Clearly marked with descriptions
  • Optional Parameters: Default values and use cases
  • Security Parameters: Authentication and encryption options
  • Performance Parameters: Throughput and batching settings
  • Example Configurations: Common scenarios

SNIP References

Check source_code_refs/snip/proposals/ for design documents related to:
  • BigQuery connector (SNIP-39)
  • Snowflake connector (SNIP-49)
  • Kafka Connect support (SNIP-130, SNIP-134)
  • Connector secrets management (SNIP-107)
  • Connector UI enhancements (SNIP-109)
  • SQS sink improvements (SNIP-117)
  • Cloud storage package management (SNIP-132)

Common Tasks

Adding New Connector Documentation

  1. Create directory under connectors/{connector-name}/current/
  2. Follow standard structure (overview, config, usage, etc.)
  3. Include architecture diagrams showing data flow
  4. Add to navigation in docs.json
  5. Update connector index/overview pages

Documenting Connector Configuration

  1. List all configuration parameters
  2. Group by category (connection, authentication, performance)
  3. Include validation rules and constraints
  4. Show example values and use cases
  5. Document environment variable alternatives

Cloud vs Self-Hosted Deployment

  1. Cloud: Console UI deployment steps
  2. Cloud: API/CLI deployment examples
  3. Private Cloud: Kubernetes CRD examples
  4. Standalone: Docker and binary deployment

Performance Tuning Guides

  1. Batch size optimization
  2. Parallelism settings
  3. Memory allocation
  4. Network timeout configuration
  5. Error handling and retry policies

Connector Categories

Source Connectors

  • Database CDC (Debezium MySQL, PostgreSQL, MongoDB)
  • Message queues (Kafka, RabbitMQ, AWS SQS)
  • Cloud storage (S3, GCS, Azure Blob)
  • Streaming platforms (Kinesis, EventBridge)

Sink Connectors

  • Data warehouses (BigQuery, Snowflake)
  • Databases (Cassandra, MongoDB, JDBC)
  • Search engines (Elasticsearch)
  • Message queues (Kafka, SQS)
  • Cloud storage (S3, GCS, Azure Blob)
  • Analytics (InfluxDB, Pinecone)

Kafka Connect Connectors

  • Kafka Connect compatible connectors
  • Deployed via Kafka Connect runtime
  • Configured using Connect REST API
  • Support for SMTs (Single Message Transforms)

Important Considerations

Exactly-Once Semantics

  • Which connectors support exactly-once delivery
  • Configuration requirements for guarantees
  • Performance implications
  • Failure recovery behavior

Schema Management

  • Schema registry integration
  • Schema evolution support
  • Data format conversions
  • Compatibility between source and sink

Security

  • Authentication methods by connector
  • Encryption in transit and at rest
  • Secret management best practices
  • Network security requirements

Monitoring and Operations

  • Metrics exposed by connectors
  • Health check endpoints
  • Log aggregation
  • Alert configuration
  • Scaling considerations

Cross-References

  • Cloud docs for connector deployment UI
  • Private Cloud docs for CRD-based deployment
  • Clients docs for producer/consumer patterns
  • API docs for connector management endpoints