- Build Applications
- Kafka Clients
- Config
Configuring Kafka Consumer
An Apache Kafka consumer is a client application that subscribes to one or more topics and reads & processes messages from them. This section describes the configuration options available for Kafka consumers.
Consumer Configuration
This following sections describe the key configuration settings for Kafka consumers and explain how they affect consumer behavior.
For common client settings, such as networking and authentication settings, see Configuring Kafka Clients.
Group Configuration
group.id
A unique identifier for the consumer group. While optional, you should always configure a group ID unless you are using the simple assignment API and don't need to store offsets in Kafka.
session.timeout.ms
Controls how long a consumer can go without sending heartbeats to the coordinator before being considered failed. The default is 10
seconds for C/C++ and Java clients. You can increase this value to avoid excessive rebalancing due to poor network connectivity or long GC pauses. However, using a larger timeout means it will take longer for the coordinator to detect crashed consumers and reassign their partitions to other group members. For normal shutdowns, the consumer explicitly leaves the group, triggering an immediate rebalance.
heartbeat.interval.ms
Controls how frequently the consumer sends heartbeats to the coordinator. These heartbeats also help detect when rebalancing is needed, so a lower interval generally enables faster rebalancing. The default is 3
seconds. For larger consumer groups, consider increasing this value to reduce coordinator load.
max.poll.interval.ms
Specifies the maximum allowed time between calls to the consumer's poll method (or Consume
method in .NET) before the consumer is considered failed. The default is 300
seconds and can be increased if your application needs more time to process messages. For Java consumers, you can also adjust max.poll.records
to control how many records are processed in each poll iteration.
Offset Management Configuration
There are two main settings that affect how offsets are managed: whether automatic offset committing is enabled and the offset reset policy.
enable.auto.commit
Controls whether the consumer automatically commits offsets periodically (default is true
). When enabled, offsets are committed at the interval specified by auto.commit.interval.ms
, which defaults to 5
seconds. When disabled, you must manually commit offsets using commitSync()
or commitAsync()
.
auto.offset.reset
Determines how the consumer behaves when it needs to read from a position with no committed offset, or when the committed offset is invalid (out of range). This can occur when:
- The consumer group is first created
- The committed offset has been deleted due to retention policies
- The consumer requests an offset that does not exist
Valid values are:
latest
(default): Start reading from the newest available messagesearliest
: Start reading from the beginning of the topicnone
: Throw an exception if no previous offset is found
Partition Assignment Configuration
partition.assignment.strategy
Controls how partitions are distributed among consumer instances when using group management. All consumers in the same group must use the same strategy.
This setting accepts a comma-separated list of fully qualified class names that implement the PartitionAssignor
interface. Multiple strategies can be specified to support transitioning between strategies while maintaining compatibility with consumers using the previous strategy.
The following strategies are available:
Range Assignment (Default)
- Class:
org.apache.kafka.clients.consumer.RangeAssignor
- Behavior: Distributes partitions of each topic evenly across consumers in a group after sorting both partitions and consumers
- Best for: Cases where partition count exceeds consumer count
- Limitation: May result in uneven distribution if partition count is not divisible by consumer count
- Class:
Round Robin Assignment
- Class:
org.apache.kafka.clients.consumer.RoundRobinAssignor
- Behavior: Distributes partitions one by one across consumers in a round-robin fashion
- Best for: Scenarios requiring even distribution regardless of partition count
- Limitation: May trigger more frequent rebalances compared to Range Assignment
- Class:
Sticky Assignment
- Class:
org.apache.kafka.clients.consumer.StickyAssignor
- Behavior: Maintains stable partition assignments across rebalances while ensuring balanced distribution
- Best for: Applications sensitive to partition reassignment overhead
- Limitation: May not achieve optimal balance when cluster topology changes frequently
- Class:
Cooperative Sticky Assignment
- Class:
org.apache.kafka.clients.consumer.CooperativeStickyAssignor
- Behavior: Enables incremental rebalancing while maintaining sticky assignments
- Best for: Large consumer groups where minimizing rebalance impact is critical
- Limitation: Requires all consumers to support cooperative rebalancing protocol
- Class:
Message Handling
The Java consumer performs all I/O and processing in the foreground thread, while librdkafka-based clients (C/C++, Python, Go, and C#) use a background thread. This architectural difference has several important implications:
Thread Safety: In librdkafka-based clients, polling is thread-safe and can be used from multiple threads. This allows you to parallelize message handling across multiple threads, as poll operations retrieve messages from a queue that's filled by the background thread.
Background Processing: With librdkafka-based clients, heartbeats and rebalancing occur in the background thread. This provides two key effects:
- Advantage: Message handling won't cause the consumer to miss a rebalance
- Disadvantage: If your message processor fails, the background thread continues sending heartbeats, causing the consumer to retain its partitions and potentially accumulate read lag until the process is terminated
Despite these architectural differences, the clients' approaches are conceptually similar. For example, you can implement a similar pattern in the Java client by introducing a queue between the poll loop and message processors. In this setup, the poll loop would populate the queue, and processors would consume messages from it, effectively creating a background processing model.