Example Kafka lag monitoring of consumer groups using Kafka AdminClient, Kamon and InfluxDB.
NOTE: It does not require access to Zookeeper.
This application is dockerized and the latest image can be found on Docker Hub mwizner/kafka-kamon-lag-monitor.
The application can be configured using the following environment variables:
KAFKA_BOOTSTRAP_SERVERS- List of Kafka servers used to bootstrap connections to Kafka.KAFKA_SECURITY_PROTOCOL- Protocol used to communicate with brokers.PLAINTEXTis the default value, useSSLto communicate with brokers via SSL.MONITOR_CONSUMER_GROUP- Consumer group the monitor belongs to.MONITOR_CLIENT_ID- Client id of the monitor.CONSUMER_GROUPS- Consumer groups which offset will be monitored, e.g.group1orgroup1,group2,group3.POLL_INTERVAL- (optional) Time interval for polling Kafka, defaults to5 seconds.GROUP_LAG_HISTOGRAM_NAME- (optional) Name of the consumer group lag histogram, defaults tokafka-consumer-groups-lag.GROUP_STATE_HISTOGRAM_NAME- (optional) Name of the consumer group state histogram, defaults tokafka-consumer-groups-state.TICK_INTERVAL- (optional) Time interval for collecting the metrics, defaults to10 seconds.INFLUXDB_HOSTNAME- Hostname on which InfluxDB is running.INFLUXDB_PORT- Port on which InfluxDB is listening.EXIT_ON_ERROR- (optional) Exit with code 1 on any error thrown in the app, defaults tofalse. If you're running this in docker, you can use the--restart unless-stoppedflag to restart the app when it exits.
The following settings are optional and apply only if KAFKA_SECURITY_PROTOCOL is set to SSL.
KAFKA_SSL_PROTOCOL- The SSL protocol used to generate the SSLContext, e.g.TLS.KAFKA_SSL_KEY_PASSWORD- The password of the private key in the key store file.KAFKA_SSL_KEYSTORE_TYPE- The file format of the key store file, e.g.PKCS12.KAFKA_SSL_KEYSTORE_PATH- The location of the key store file.KAFKA_SSL_KEYSTORE_PASSWORD- The password for the key store file.KAFKA_SSL_TRUSTSTORE_TYPE- The file format of the trust store file, e.g.JKS.KAFKA_SSL_TRUSTSTORE_PATH- The location of the trust store file.KAFKA_SSL_TRUSTSTORE_PASSWORD- The password for the trust store file.
An example config:
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_SECURITY_PROTOCOL=SSL
KAFKA_SSL_PROTOCOL=TLS
KAFKA_SSL_KEY_PASSWORD=password
KAFKA_SSL_KEYSTORE_PATH=/etc/opt/kafka-kamon-lag-monitor/keystore.p12
KAFKA_SSL_KEYSTORE_PASSWORD=password
KAFKA_SSL_KEYSTORE_TYPE=PKCS12
KAFKA_SSL_TRUSTSTORE_PATH=/etc/opt/kafka-kamon-lag-monitor/truststore.jks
KAFKA_SSL_TRUSTSTORE_PASSWORD=password
KAFKA_SSL_TRUSTSTORE_TYPE=JKS
MONITOR_CONSUMER_GROUP=lag-monitor
MONITOR_CLIENT_ID=lag-monitor
CONSUMER_GROUPS=service1,service2
INFLUXDB_HOSTNAME=localhost
INFLUXDB_PORT=8189
EXIT_ON_ERROR=true
The app records Kafka metrics using Kamon histogram instruments - the measurements are exposed in InfluxDB as kafka-lag-monitor-timers and tagged with kafka-consumer-groups-lag and kafka-consumer-groups-state entities. Kafka consumer group states get translated into the following values in the histogram:
- 0 -
DeadorEmpty - 1 -
Stable - 2 -
PreparingRebalanceorCompletingRebalance - 3 - any other states
- 1.3.0 - Introduced a new config option to exit the application on error.
- 1.2.0 - Fixed InfluxDB parsing issue & changed the consumer group state metric value to 3 for any unrecognised consumer group states.
- 1.1.0 - Added metrics for consumer group state.
- 1.0.1 - Made
KAFKA_SECURITY_PROTOCOLenvironment variable optional. - 1.0.0 - Forked from bfil/kafka-kamon-lag-monitor and dockerized.