Thursday, December 12, 2019

Important Configuration Parameters of Kafka Producers

The list of Kafka configurations for Producer is quite large. But, the good news is that you are not forced to configure all of them; Kafka provides a default for most of them.  This may work for most of the cases. But, if you are particular about the performance, reliability,  throughput, latency then it's worth revisiting them and customizing as per your specific need. 

This post, I will cover some of the important configurations. 
Kafka Reference - here.


compression.type

Default value = none. (i.e. No compression).
Available values = none, gzip, snappy, lz4

This is the algorithm that will be used by the producer (sitting in your application) to compress data before sending them to the brokers. If multiple messages are getting batched together before sending then this configuration improves performance. Enabling compression will reduce network utilization and storage. Snappy (invented by Google) provides decent compression ratios with low CPU overhead. Gzip, typically provides a better compression ratio but uses more CPU. So if network bandwidth is limited choose Gzip else go for Snappy. 

batch.size

Default value = 16384 (i.e. 16K bytes)

Kafka Producer batches messages for each partition before sending them to the specific partition. This parameter controls the amount of memory (in bytes) which will be used for each batch. Kafka producer uses batch size and the timeout (linger.ms) to decide when to send. The producer will try to accumulate as many messages are possible (<= batch.size) and then send all of them in one go. If the batch size is very small, the Producer will be sending messages more frequently (0 value will disable batching).  A larger batch size may waste some memory as the allocated memory might not get fully utilized. 


linger.ms

Default value = 0

This value allows the Producer to group together records/messages before they get sent to the broker. This is the amount of time in milliseconds for which the producer will wait for accumulating messages in a batch. If this value is not set (default), then the producer will send messages as and when they arrive. Latency will be minimum for the default value. Setting this value to say, 5 will increase the latency but at the same time, it will also increase throughput (as you can send more messages in one go, so less overhead per message). If there is no load then setting it to 5 will increase latency by up to 5 ms. 


acks

Default value = 1

This controls the number of acknowledgments the producer requires the leader to have received before considering a request complete. This affects the durability of the message.

acks=0

The message is considered to be written successfully to Kafka if the producer managed to send it over then network. 


2 comments: