Monitoring Kafka w/ Prometheus


Citation preview

Monitoring Kafka w/ PrometheusYuto Kawamura(kawamuray)

About me● Software Engineer @ LINE corp

○ Develop & operate Apache HBase clusters○ Design and implement data flow between services with ♥ to Apache Kafka

● Recent works○ Playing with Apache Kafka and Kafka Streams



● Past works○ Docker + Checkpoint Restore @ CoreOS meetup http://www.slideshare.


○ Norikraでアプリログを集計してリアルタイムエラー通知 @ Norikra Meetup #1

○ Student @ Google Summer of Code 2013, 2014●

How are we(our team) using Prometheus?● To monitor most of our middleware, clients on Java applications

○ Kafka clusters○ HBase clusters○ Kafka clients - producer and consumer○ Stream Processing jobs

Overall Architecture



HBase clusterHBase

clusterKafka cluster





YARN Application



Direct query

Why Prometheus?● Inhouse monitoring tool wasn’t enough for large-scale + high resolution metrics

collection● Good data model

○ Genuine metric identifier + attributes as labels

■ http_requests_total{code="200",handler="prometheus",instance="localhost:9090",job="prometheus",method="get"}

● Scalable by nature● Simple philosophy

○ Metrics exposure interface: GET /metrics => Text Protocol○ Monolithic server

● Flexible but easy PromQL○ Derive aggregated metrics by composing existing metrics○ E.g, Sum of TX bps / second of entire cluster

■ sum(rate(node_network_receive_bytes{cluster="cluster-A",device="eth0"}[30s]) * 8)

Deployment● Launch

○ Official Docker image:

○ Ansible for dynamic prometheus.yml generation based on inventory and container management

● Machine spec○ 2.40GHz * 24 CPUs○ 192GB RAM○ 6 SSDs○ Single SSD / Single Prometheus instance

○ Overkill? => Obviously. Reused existing unused servers. You must don’t need this crazy spec just to use it.

Kafka monitoring w/ Prometheus overview

Kafka broker

Kafka client in Java application

YARN ResourceManager

Stream Processing jobs on YARN

Prometheus Server


Jmx exporter

Prometheus Java

library+ Servlet

JSON exporter

Kafka consumer

group exporter

Monitoring Kafka brokers - jmx_exporter●● Run as standalone process(no -javaagent)

○ Just in order to avoid cumbersome rolling restart○ Maybe turn into use javaagent on next opportunity of rolling restart :p

● With very complicated config.yml○

● Colocate one instance per broker on the same host

Monitoring Kafka producer on Java application - prometheus_simpleclient●● Official Java client library

prometheus_simpleclient - Basic usageprivate static final Counter queueOutCounter = .namespace("kafka_streams") // Namespace(= Application prefix?) .name("process_count") // Metric name .help("Process calls count") // Metric description .labelNames("processor", "topic") // Declare labels .register(); // Register to CollectorRegistry.defaultRegistry (default, global registry)

...queueOutCounter.labels("Processor-A", "topic-T").inc(); // Increment counter with labelsqueueOutCounter.labels("Processor-B", "topic-P").inc(2.0);

=> kafka_streams_process_count{processor="Processor-A",topic="topic-T"} 1.0kafka_streams_process_count{processor="Processor-B",topic="topic-P"} 2.0

Exposing Java application metrics● Through servlet

○ io.prometheus.client.exporter.MetricsServlet from simpleclient_servlet

● Add an entry to web.xml or embedded jetty ..

Server server = new Server(METRICS_PORT); ServletContextHandler context = new ServletContextHandler(); context.setContextPath("/"); server.setHandler(context); context.addServlet(new ServletHolder(new MetricsServlet()), "/metrics"); server.start();

Monitoring Kafka producer on Java application - prometheus_simpleclient● Primitive types:

○ Counter, Gauge, Histogram, Summary● Kafka’s MetricsRerpoter interface gives KafkaMetrics instance● How to expose the value?● => Implement proxy metric type which implements

SimpleCollector public class PrometheusMetricsReporter implements MetricsReporter {... private void registerMetric(KafkaMetric kafkaMetric) { ... .namespace(“kafka”) .name(fqn) .help("Help: " + metricName.description()) .labelNames(labelNames) .register(); ... }...}

public class KafkaMetricProxy extends SimpleCollector<KafkaMetricProxy.Child> { public static class Builder extends SimpleCollector.Builder<Builder, KafkaMetricProxy> { @Override public KafkaMetricProxy create() { return new KafkaMetricProxy(this); } }

KafkaMetricProxy(Builder b) { super(b); }


@Override public List<MetricFamilySamples> collect() { List<MetricFamilySamples.Sample> samples = new ArrayList<>(); for (Map.Entry<List<String>, Child> entry : children.entrySet()) { List<String> labels = entry.getKey(); Child child = entry.getValue(); samples.add(new Sample(fullname, labelNames, labels, child.getValue())); } return Collections.singletonList(new MetricFamilySamples(fullname, Type.GAUGE, help, samples)); }}

Monitoring YARN jobs - json_exporter●

○ Can export value from JSON by specifying the value as JSONPath

● http://<rm http address:port>/ws/v1/cluster/apps○


json_exporter- name: yarn_application type: object path: $[*]?(@.state == "RUNNING") labels: application: $.id phase: beta values: alive: 1 elapsed_time: $.elapsedTime allocated_mb: $.allocatedMB...

{"apps":{"app":[ { "id": "application_1234_0001", "state": "RUNNING", "elapsedTime": 25196, "allocatedMB": 1024, ... },... }}


yarn_application_alive{application="application_1326815542473_0001",phase="beta"} 1yarn_application_elapsed_time{application="application_1326815542473_0001",phase="beta"} 25196yarn_application_allocated_mb{application="application_1326815542473_0001",phase="beta"} 1024

Important configurations● -storage.local.retention(default: 15 days)

○ TTL for collected values● -storage.local.memory-chunks(default: 1M)

○ Practically controls memory allocation of Prometheus instance○ Lower value can cause ingestion throttling(metric loss)

● -storage.local.max-chunks-to-persist(default: 512K)○ Lower value can cause ingestion throttling likewise○

○ > Equally important, especially if writing to a spinning disk, is raising the value for the storage.

local.max-chunks-to-persist flag. As a rule of thumb, keep it around 50% of the storage.local.memory-chunks value.

● -query.staleness-delta(default: 5mins)○ Resolution to detect lost metrics○ Could lead weird behavior on Prometheus WebUI

Query tips - label_replace function● It’s quite common that two metrics has different label sets

○ E.g, server side metric and client side metrics● Say have metrics like:

○ kafka_log_logendoffset{cluster="cluster-A",instance="HOST:PORT",job="kafka",partition="1234",topic="topic-A"}

● Introduce new label from existing label○ label_replace(..., "host", "$1", "instance", "^([^:]+):.*")

○ => kafka_log_logendoffset{...,instance=”HOST:PORT”,host=”HOST”}

● Rewrite existing label with new value○ label_replace(..., "instance", "$1", "instance", "^([^:]+):.*")

○ => kafka_log_logendoffset{...,instance=”HOST”}

● Even possible to rewrite metric name… :D○ label_replace(kafka_log_logendoffset, "__name__", "foobar", "__name__", ".*")

○ => foobar{...}

Points to improve● Service discovery

○ It’s too cumbersome to configure server list and exporter list statically○ Pushgateway?

■ > The Prometheus Pushgateway exists to allow ephemeral and batch jobs to expose

their metrics to Prometheus -

○ file_sd_config?<file_sd_config>

■ > It reads a set of files containing a list of zero or more <target_group>s. Changes to all defined files are detected via disk watches and applied immediately.

● Local time support :(○ They don’t like TZ other than UTC; making sense though: https://prometheus.

io/docs/introduction/faq/#can-i-change-the-timezone?-why-is-everything-in-utc?○○ Still might possible to introduce toggle on view

Conclusion● Data model is very intuitive● PromQL is very powerful and relatively easy

○ Helps you find out important metrics from hundreds of metrics

● Few pitfalls needs to be avoid w/ tuning configurations○ memory-chunks, query.staleness-detla…

● Building exporter is reasonably easy○ Officially supported lot’s of languages…○ /metrics is the only interface


End of Presentation


○○ kafka_producer_request_rate○ http_request_duration

● Fully utilize labels○ x: kafka_network_request_duration_milliseconds_{max,min,mean}○ o: kafka_network_request_duration_milliseconds{“aggregation”=”max|min|mean”}

○ Compare all min/max/mean in single graph: kafka_network_request_duration_milliseconds{instance=”HOSTA”}

○ Much flexible than using static name

Alerting● Not using Alert Manager● Inhouse monitoring tool has alerting capability

○ Has user directory of alerting target○ Has known expression to configure alerting

○ Tool unification is important and should be respected as possible

● Then?○ Built a tool to mirror metrics from Prometheus to inhouse

monitoring tool○ Setup alert on inhouse monitoring tool

/api/v1/query?query=sum(kafka_stream_process_calls_rate{client_id=~"CLIENT_ID.*"}) by (instance)

{ "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "instance": "HOST_A:PORT" }, "value": [ 1465819064.067, "82317.10280584119" ] }, { "metric": { "instance": "HOST_B:PORT" }, "value": [ 1465819064.067, "81379.73499610288" ] }, ] }}

public class KafkaMetricProxy extends SimpleCollector<KafkaMetricProxy.Child> {... public static class Child { private KafkaMetric kafkaMetric;

public void setKafkaMetric(KafkaMetric kafkaMetric) { this.kafkaMetric = kafkaMetric; }

double getValue() { return kafkaMetric == null ? 0 : kafkaMetric.value(); } }

@Override protected Child newChild() { return new Child(); }...}

Monitoring Kafka consumer offset - kafka_consumer_group_exporter●● Exports some metrics WRT Kafka consumer group by executing kafka- command(bundled to Kafka)● Specific exporter for specific use● Would better being familiar with your favorite exporter framework

○ Raw use of official prometheus package:

○ Mine:

Query tips - Product set● Calculated result of more than two metrics results product set

● metric_A{cluster=”A or B”}● metric_B{cluster=”A or B”,instance=”a or b or c”}● metric_A / metric_B● => {}● metric_A / sum(metric_B) by (cluster)● => {cluster=”A or B”}● x: metric_A{cluster=”A”} - sum(metric_B{cluster=”A”}) by (cluster)● o: metric_A{cluster=”A”} - sum(metric_B) by (cluster) => Same result!
