弹性-检查给定时间范围内的所有值是否都大于阈值X

问题描述

我想使用弹性查询在Kibana中创建警报。我正在使用opendistro警报功能。我想检查最近10分钟内cpu.pct字段的所有值是否大于50,如果是,则发出警报。

{
"size": 500,"query": {
    "bool": {
        "filter": [
            {
                "match_all": {
                    "boost": 1
                }
            },{
                "match_phrase": {
                    "client.id": {
                        "query": "42","slop": 0,"zero_terms_query": "NONE","boost": 1
                    }
                }
            },{
                "range": {
                    "cpu.pct": {
                        "from": 10,"to": null,"include_lower": true,"include_upper": true,{
                "range": {
                    "@timestamp": {
                        "from": "{{period_end}}||-5m","to": "{{period_end}}","format": "epoch_millis","boost": 1
                    }
                }
            }
        ],"adjust_pure_negative": true,"boost": 1
    }
},"aggregations": {
    "2": {
        "terms": {
            "field": "client.name.keyword","size": 5,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": {
                "_key": "desc"
            }
        },"aggregations": {
            "3": {
                "terms": {
                    "field": "component.name","size": 1000,"order": [
                        {
                            "1": "desc"
                        },{
                            "_key": "asc"
                        }
                    ]
                },"aggregations": {
                    "1": {
                        "avg": {
                            "field": "cpu.pct"
                        }
                    }
                }
            }
        }
    }
}

我有以下查询计算平均值,但这是不正确的。

否定情况:值(100、100、100、100、100、100、0、0、0、0)|发出警报:否(平均:60)

正例:值(60,60,60,60,60,60,60,60,60,60,60)|发出警报:是(平均:60)

如何检查所有值?

解决方法

我不确定您要使用哪个应用程序来触发警报。解决您的问题的一种方法是通过两个过滤器聚合:

  1. totalInLast10Min:这是为了获取最近10分钟内被编制索引的文档总数。
  2. totalInLast10MinAboveTh:这是为了获取最近10分钟内被编制索引的文档总数,并且该字段的值超过阈值。

如果totalInLast10Min == totalInLast10MinAboveTh,则触发警报。

例如

创建索引

PUT test
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date","format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

添加一些文档

POST test/_doc
{"cpu":20,"timestamp":"2020-08-18 20:20:00"}

POST test/_doc
{"cpu":100,"timestamp":"2020-08-18 20:21:00"}

POST test/_doc
{"cpu":90,"timestamp":"2020-08-18 20:29:00"}

查询:

GET test/_search
{
  "size": 0,"aggs": {
    "totalInLast10Min": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2020-08-18 20:20:00"
          }
        }
      }
    },"totalInLast10MinAboveTh": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "timestamp": {
                  "gte": "2020-08-18 20:20:00"
                }
              }
            },{
              "range": {
                "cpu": {
                  "gte": 80
                }
              }
            }
          ]
        }
      }
    }
  }
}

采样结果:

{
  "took" : 1,"timed_out" : false,"_shards" : {
    "total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0
  },"hits" : {
    "total" : {
      "value" : 3,"relation" : "eq"
    },"max_score" : null,"hits" : [ ]
  },"aggregations" : {
    "totalInLast10MinAboveTh" : {
      "meta" : { },"doc_count" : 2
    },"totalInLast10Min" : {
      "meta" : { },"doc_count" : 3
    }
  }
}

基于两个警报的计数,您可以编写何时触发警报的条件。