Consumer Lag in MSK(Kafka) with Burrow

AS
2 min readJun 2, 2020

Amazon MSK is being widely used, and it is a fully managed service for the streaming of data. MSK provides various out of the box monitoring capabilities like disk usage per broker, CPU usage per broker, Network TX(transmit) and RX(Receive) packet broker. Also, various other cloud-watch metrics which is on-demand and incur some extra cost.

what is consumer lag?
The lag between Kafka consumer and Kafka producer. The amount of data that is produced by the producer and yet to be consumed by the consumer.

Consumer lag in Kafka is an important metric for the production environment. When we are dealing with a huge amount of streaming data, in that case, We have to decrease the retention of the topic which is the default for 7days

If consumer lag is continuously increasing then there may be a condition when the application which is consuming data from a topic gets out sync as the demanded offsets by the applications are no longer available in Kafka because of the retention policy.

Installing burrow

https://docs.aws.amazon.com/msk/latest/developerguide/monitoring.html#burrow

##Installing go
sudo yum install go
##Get burrow project
go get github.com/linkedin/Burrow
##Install dependencies(/home/ec2-user/go/bin/dep)
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
##Ensure all the dependencies
cd /home/ec2-user/go/src/github.com/linkedin/Burrow
/home/ec2-user/go/bin/dep ensure
##Install
go install

Configurations

Configurations file — /home/ec2-user/go/src/github.com/linkedin/Burrow/config/burrow.toml

[general]
pidfile="burrow.pid"
stdout-logfile="burrow.out"
access-control-allow-origin="*"
[logging]
filename="logs/burrow.log"
level="info"
maxsize=100
maxbackups=30
maxage=10
use-localtime=false
use-compression=true
[zookeeper]
servers=[ "$ZooKeeper-Address" ]
root-path="/burrow"
[client-profile.burrow-client]
client-id="burrow-clinet"
kafka-version="2.2.1"
[cluster.my-kafka]
class-name="kafka"
servers=[ "$KAFKA SERVER" ]
client-profile="burrow-client"
topic-refresh=120
offset-refresh=30
[consumer.my-kafka]
class-name="kafka"
cluster="my-kafka"
servers=[ "$KAFKA SERVER" ]
client-profile="burrow-client"
offsets-topic="__consumer_offsets"
start-latest=true
group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$"
group-whitelist=""
[httpserver.default]
address=":8000"
[storage.default]
class-name="inmemory"
workers=20
intervals=15
expire-group=604800
min-distance=1

Installing Telegraf

https://docs.influxdata.com/telegraf/v1.14/introduction/installation/

cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
sudo yum install telegraf
sudo service telegraf start

Update the telegraf conf file to get input from the burrow

Telegraf collects the input metrics from the burrow and sends it to a time series database influxdb in this case.

https://github.com/influxdata/telegraf/tree/master/plugins/inputs/burrow

[[inputs.burrow]]
## Burrow API endpoints in format "schema://host:port".
## Default is "http://localhost:8000".
servers = ["http://localhost:8000"]

Grafana dashboard for consumer lag

https://grafana.com/grafana/dashboards/10207

--

--

AS

Software engineer at Expedia Group. Passionate about data-science and Big Data technologies.