问题描述
我是 prometheus 的新手,所以我只是用 blackBox_exporter、node_exporter 配置了 prometheus,我想通过使用 node_exporter 来监控 cpu、内存使用情况并监控 ping、域名、端口,如 MysqL、mongo、Nginx
所以我有两台机器
1.ServerSide(node_exporter,blackBox_exporter,prometheus,grafana,alertmanager)
2.ClientSide(node_exporter,blackBox_exporter)
1.本地主机
2.clienthost
问题是当我配置了如下所示的单个目标时
这是我的prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
query_log_file: /var/log/prometheus/query.log
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
- "localhost:9093"
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "alert_rules.yml"
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
#scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
#- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9090
- job_name: 'node_exporter_metrics'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
- job_name: 'check_domain_status'
params:
module:
- http_2xx
target:
- 'http://localhost'
metrics_path: /probe
scheme: http
static_configs:
- targets:
- 'localhost:9115'
- job_name: 'check_Nginx_port_status'
params:
module:
- tcp_connect
target:
- 'localhost:80'
metrics_path: /probe
static_configs:
- targets:
- 'localhost:9115'
- job_name: 'check_icmp_status'
params:
module:
- icmp
target:
- 'localhost'
metrics_path: /probe
static_configs:
- targets:
- 'localhost:9115'
这是我的alert_rules.yml文件
groups:
- name: "node-exporter"
rules:
- alert: 80_PortIsDown
expr: probe_success{job=~"check_Nginx_port_status"} == 0
for: 1m
labels:
severity: 3
annotations:
summary: "PortIs (instance {{ $labels.instance }}) down"
description: "PortIsDown \n VALUE = {{ $value }}\n LABELS: {{ $labels.hostname }}"
- alert: EndpointDown
expr: probe_success == 0
for: 0m
labels:
severity: 3
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "EndpointDown \n VALUE = {{ $value }}\n LABELS: {{ $labels.hostname }}"
当我通过添加下面这样的目标来添加客户端服务器时,总是收到 (probe_success{job=~"check_Nginx_port_status"} == 0) 和 (probe_success == 0)
添加目标 prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
query_log_file: /var/log/prometheus/query.log
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
- "localhost:9093"
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "alert_rules.yml"
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
#scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
#- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9090
- job_name: 'node_exporter_metrics'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
- targets: ['clienthost:9100']
- job_name: 'blackBox_exporter_metrics'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9115']
- targets: ['clienthost:9115']
- job_name: 'check_domain_status'
params:
module:
- http_2xx
target:
- 'http://localhost'
- 'http://clienthost'
metrics_path: /probe
scheme: http
static_configs:
- targets:
- 'localhost:9115'
- 'clienthost:9115'
- job_name: 'check_Nginx_port_status'
params:
module:
- tcp_connect
target:
- 'localhost:80'
- 'clienthost:80'
metrics_path: /probe
scheme: http
static_configs:
- targets:
- 'localhost:9115'
- 'clienthost:9115'
- job_name: 'check_icmp_status'
params:
module:
- icmp
target:
- 'localhost'
- 'clienthost'
metrics_path: /probe
scheme: http
static_configs:
- targets:
- 'localhost:9115'
- 'clienthost:9115'
谁能帮我解决这个问题,我的 yml 有问题吗
谢谢, 约翰
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)