kafka的监控搭建–yau 主体思想:采用prometheus+grafana来做。现在先要找到合适的kafka-exporter,先找了一个kafka-exporter,发现监控项较少,不太符合要求。继续找,决定用jmx监控来最大限度获取到kafka的指标,采用jmx_prometheus_javaagent-0.6.jar+kafka-0-8-2.yml 其他现有的一些组件如 KafkaOffsetMonitor、Burrow、kafka-monitor、Kafka-Manager, 它们基本上都是监控topic的写入和读取等等,没有提供对于整体集群的监控信息,比如集群的分片、延时、内存使用情况等等 具体步骤(略过prometheus和grafna的部署): 1.这两个文件放到kafka主目录下。然后kafka启动方式改为由jvm启动,这样方能采集到相关数据。 2.启动命令: KAFKA_OPTS=”$KAFKA_OPTS -javaagent:../jmx_prometheus_javaagent-0.6.jar=7071:../kafka-0-8-2.yml” ./kafka-server-start.sh -daemon ../config/server.properties 3.通过curl ip:7071/metrics可以获取到所有监控项,约有1w多个 4.在prometheus上配置这个监控项,启动prometheus
my global config
global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout is set to the global default (10s).
Alertmanager configuration
alerting: alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
Load rules once and periodically evaluate them according to the global ‘evaluation_interval’.
rule_files:
- “first_rules.yml”
- “second_rules.yml”
A scrape configuration containing exactly one endpoint to scrape:
Here it’s Prometheus itself.
scrape_configs:
The job name is added as a label to any timeseries scraped from this config.
- job_name: ‘prometheus’
metrics_path defaults to ‘/metrics’
scheme defaults to ‘http’.
static_configs:
- targets: [‘localhost:9090’]
- job_name: ‘druid_exporter’ static_configs:
- targets: [‘10.165.6.157:8000’]
- job_name: ‘kafka_exporter’ static_configs:
- targets: [‘10.165.23.149:9308’]
- job_name: ‘kafka’ static_configs:
- targets: [‘10.165.23.149:7071’,’10.165.23.202:7071’,’10.165.23.204:7071’]
5.在prometheus页面可以看到监控项状态为up即可。
6.grafana建立prometheus的数据源
7.grafana下载一个kafka的面板插件,方便建立面板,也可以完全自定义。面板内语法即时各监控项本身。
8.配置kafka自拉起任务。
#!/bin/bash
#yau
#restart kafka
cd /usr/local/app/kafka/bin
function stop()
{
echo “stop kafka”
PIDS=$(ps ax | grep -i ‘kafka’ | grep java | grep -v grep | awk ‘{print $1}’)
if [ -z “$PIDS” ]; then
echo “No kafka server to stop”
else
kill -9 $PIDS
fi
}
function start()
{
echo “starting kafa”
KAFKA_OPTS=”$KAFKA_OPTS -javaagent:../jmx_prometheus_javaagent-0.6.jar=7071:../kafka-0-8-2.yml” ./kafka-server-start.sh -daemon ../config/server.properties
}
function moni()
{
echo “moni kafka..”
if ! ps ax | grep -i ‘kafka’ | grep java | grep -v grep
&
/dev ull;then start fi } case $1 in start) start ;; stop) stop ;; restart) stop & & start ;; moni) moni ;; *) echo “Usage: $name [start|stop|restart|moni]” exit 1 ;; esac 遗留问题:三个节点分别监控,如何配置聚合监控? —-已解决:直接在prometheus的配置上targets增加list元素 grafana的kafka插件是页面添加进去的,没有合并到镜像 prometheus临时在151.100上 监控面板还要继续丰富