prometheus with blackbox_exporter,node_exporter |多目标| icmp，域，端口 |不工作

问题描述

我是 prometheus 的新手，所以我只是用 blackBox_exporter、node_exporter 配置了 prometheus，我想通过使用 node_exporter 来监控 cpu、内存使用情况并监控 ping、域名、端口，如 MysqL、mongo、Nginx

所以我有两台机器

1.ServerSide(node_exporter,blackBox_exporter,prometheus,grafana,alertmanager) 
2.ClientSide(node_exporter,blackBox_exporter)

1.本地主机

2.clienthost

问题是当我配置了如下所示的单个目标时

这是我的prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  query_log_file: /var/log/prometheus/query.log
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093
      - "localhost:9093"

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   - "alert_rules.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
#scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9090
  - job_name: 'node_exporter_metrics'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']
  - job_name: 'check_domain_status'
    params:
      module:
       - http_2xx
      target:
       - 'http://localhost'
    metrics_path: /probe
    scheme: http
    static_configs:
     - targets:
        - 'localhost:9115'

  - job_name: 'check_Nginx_port_status'
    params:
      module:
       - tcp_connect
      target:
       - 'localhost:80'
    metrics_path: /probe
    static_configs:
     - targets:
        - 'localhost:9115'

  - job_name: 'check_icmp_status'
    params:
      module:
       - icmp
      target:
       - 'localhost'
    metrics_path: /probe
    static_configs:
     - targets:
        - 'localhost:9115'

这是我的alert_rules.yml文件

groups:
- name: "node-exporter"
  rules:
  - alert: 80_PortIsDown
    expr: probe_success{job=~"check_Nginx_port_status"} == 0
    for: 1m
    labels:
      severity: 3
    annotations:
      summary: "PortIs (instance {{ $labels.instance }}) down"
      description: "PortIsDown \n  VALUE = {{ $value }}\n  LABELS: {{ $labels.hostname }}"

  - alert: EndpointDown
    expr: probe_success == 0
    for: 0m
    labels:
      severity: 3
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"
      description: "EndpointDown \n  VALUE = {{ $value }}\n  LABELS: {{ $labels.hostname }}"

当我通过添加下面这样的目标来添加客户端服务器时，总是收到 (probe_success{job=~"check_Nginx_port_status"} == 0) 和 (probe_success == 0)

添加目标 prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  query_log_file: /var/log/prometheus/query.log
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093
      - "localhost:9093"

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   - "alert_rules.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
#scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9090
  - job_name: 'node_exporter_metrics'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']
      - targets: ['clienthost:9100']

  - job_name: 'blackBox_exporter_metrics'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9115']
      - targets: ['clienthost:9115']

  - job_name: 'check_domain_status'
    params:
      module:
       - http_2xx
      target:
       - 'http://localhost'
       - 'http://clienthost'
    metrics_path: /probe
    scheme: http
    static_configs:
     - targets:
        - 'localhost:9115'
        - 'clienthost:9115'
  - job_name: 'check_Nginx_port_status'
    params:
      module:
       - tcp_connect
      target:
       - 'localhost:80'
       - 'clienthost:80'
    metrics_path: /probe
    scheme: http
    static_configs:
     - targets:
        - 'localhost:9115'
        - 'clienthost:9115'
  - job_name: 'check_icmp_status'
    params:
      module:
       - icmp
      target:
       - 'localhost'
       - 'clienthost'
    metrics_path: /probe
    scheme: http
    static_configs:
     - targets:
        - 'localhost:9115'
        - 'clienthost:9115'

谁能帮我解决这个问题，我的 yml 有问题吗

谢谢，约翰

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

grafana prometheus prometheus-alertmanager prometheus-blackbox-exporter prometheus-node-exporter