如何根据不同的对象编写ElasticSearch Query?

问题描述

在这里,我试图根据tenant_id 和hierarchy_name 获取不同的属性名称,这是我的已编入索引的数据

       {
      "hits": [
        {
          "_index": "emp_indexs_datas_d_v","_type": "bulkindexing","_id": "84","_source": {
            "id": "2","name": "PRODUCT","values": "GEO"
          }
        },{
          "_index": "emp_indexs_datas_d_v","_id": "88","_source": {
            "id": "1","name": "CUSTOMER","values": "CUSTOMER_OPEN_1"
          }
        },"_id": "98","values": "CUSTOMER_OPEN_2"
          }
        },"_id": "100","values": "CUSTOMER-ALL"
          }
        },"_id": "99","values": "CUSTOMER_OPEN_2"
          }
      ]
    }

这是在这里尝试的查询,我在hierarchy_name的基础上获得了不同的attribute_name

{
        "query": {
            "multi_match": {
                "query": "CUSTOMER","fields": [
                    "hierarchy_name"
                ]
            }
        },"collapse": {
            "field": "attribute_name.keyword"
        }
    }

现在我想再匹配一个属性tenant_id,之前我匹配的是hierarchy_name,谁能帮我查询

预期输出。就像假设对于tenant_id 2 和hierarchy_name PRODUCT 我们得到

{
  "hits": [
    {
      "_index": "emp_indexs_datas_d_v","_source": {
        "tenant_id": "2","hierarchy_name": "CUSTOMER","attribute_name": "GEO"
      }
    },{
      "_index": "emp_indexs_datas_d_v","attribute_name": "CUSTOMER_OPEN_2"
      }

    }
  ]
}

解决方法

可以使用bool/must子句的组合来组合多个条件

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tenant_id": 2
          }
        },{
          "multi_match": {
            "query": "PRODUCT","fields": [
              "hierarchy_name"
            ]
          }
        }
      ]
    }
  },"collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67379727","_type": "_doc","_id": "1","_score": 1.4144652,"_source": {
          "tenant_id": "2","hierarchy_name": "PRODUCT","attribute_name": "GEO"
        },"fields": {
          "attribute_name.keyword": [
            "GEO"
          ]
        }
      },{
        "_index": "67379727","_id": "3","attribute_name": "CUSTOMER_OPEN_2"
        },"fields": {
          "attribute_name.keyword": [
            "CUSTOMER_OPEN_2"
          ]
        }
      }
    ]
,

这是另一种方法,它在三个方面与公认的答案不同:

  • 已分析的 match 查询被未分析的 term 过滤器替换。使用经过分析的过滤器可能会产生意外/令人惊讶的结果(有关说明,请参阅 match docs
  • multi-match 查询替换为 term 查询。对单个字段使用多匹配有点多余且难以阅读,而且它是另一个分析查询
  • collapse 替换为 terms 聚合。这就是我一直以来的做法

使用 terms agg 获取 attribute_name.keyword 的所有值意味着我们仅限于每个分片一定数量的结果。这可以通过使用 composite aggregation 来解决。我不知道同样的问题是否适用于 collapse 的使用,但如果您有大量不同的值,那么检查可能是明智的。

使用 term 查询和 terms agg 的查询:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "tenant_id": 2
          }
        },{
          "term": {
            "hierarchy_name": "PRODUCT"
          }
        }
      ]
    }
  },"aggs": {
    "distinct_attribute_names": {
      "field": "attribute_name.keyword","size": 1000
  },"size": 0
}