kafka的监控搭建–yau 主体思想:采用prometheus+grafana来做。现在先要找到合适的kafka-exporter,先找了一个kafka-exporter,发现监控项较少,不太符合要求。继续找,决定用jmx监控来最大限度获取到kafka的指标,采用jmx_prometheus_javaagent-0.6.jar+kafka-0-8-2.yml 其他现有的一些组件如 KafkaOffsetMonitor、Burrow、kafka-monitor、Kafka-Manager, 它们基本上都是监控topic的写入和读取等等,没有提供对于整体集群的监控信息,比如集群的分片、延时、内存使用情况等等 具体步骤(略过prometheus和grafna的部署): 1.这两个文件放到kafka主目录下。然后kafka启动方式改为由jvm启动,这样方能采集到相关数据。 2.启动命令: KAFKA_OPTS=”$KAFKA_OPTS -javaagent:../jmx_prometheus_javaagent-0.6.jar=7071:../kafka-0-8-2.yml” ./kafka-server-start.sh -daemon ../config/server.properties 3.通过curl ip:7071/metrics可以获取到所有监控项,约有1w多个 4.在prometheus上配置这个监控项,启动prometheus

my global config

global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

scrape_timeout is set to the global default (10s).

Alertmanager configuration

alerting: alertmanagers:

  • static_configs:
  • targets:

    - alertmanager:9093

    Load rules once and periodically evaluate them according to the global ‘evaluation_interval’.

    rule_files:

    - “first_rules.yml”

    - “second_rules.yml”

    A scrape configuration containing exactly one endpoint to scrape:

    Here it’s Prometheus itself.

    scrape_configs:

    The job name is added as a label to any timeseries scraped from this config.

  • job_name: ‘prometheus’

    metrics_path defaults to ‘/metrics’

    scheme defaults to ‘http’.

    static_configs:

  • targets: [‘localhost:9090’]
  • job_name: ‘druid_exporter’ static_configs:
  • targets: [‘10.165.6.157:8000’]
  • job_name: ‘kafka_exporter’ static_configs:
  • targets: [‘10.165.23.149:9308’]
  • job_name: ‘kafka’ static_configs:
  • targets: [‘10.165.23.149:7071’,’10.165.23.202:7071’,’10.165.23.204:7071’] 5.在prometheus页面可以看到监控项状态为up即可。 6.grafana建立prometheus的数据源 7.grafana下载一个kafka的面板插件,方便建立面板,也可以完全自定义。面板内语法即时各监控项本身。 8.配置kafka自拉起任务。 #!/bin/bash #yau #restart kafka cd /usr/local/app/kafka/bin function stop() { echo “stop kafka” PIDS=$(ps ax | grep -i ‘kafka’ | grep java | grep -v grep | awk ‘{print $1}’) if [ -z “$PIDS” ]; then echo “No kafka server to stop” else kill -9 $PIDS fi } function start() { echo “starting kafa” KAFKA_OPTS=”$KAFKA_OPTS -javaagent:../jmx_prometheus_javaagent-0.6.jar=7071:../kafka-0-8-2.yml” ./kafka-server-start.sh -daemon ../config/server.properties } function moni() { echo “moni kafka..” if ! ps ax | grep -i ‘kafka’ | grep java | grep -v grep &

    /dev ull;then start fi } case $1 in start) start ;; stop) stop ;; restart) stop & & start ;; moni) moni ;; *) echo “Usage: $name [start|stop|restart|moni]” exit 1 ;; esac 遗留问题:三个节点分别监控,如何配置聚合监控? —-已解决:直接在prometheus的配置上targets增加list元素 grafana的kafka插件是页面添加进去的,没有合并到镜像 prometheus临时在151.100上 监控面板还要继续丰富