ElasticSearch:根据 reverse_nested doc_count 对顶级聚合桶进行排序

问题描述

我正在使用 ElasticSearch 6.3 并且我正在处理具有多个子聚合的聚合,其中我需要根据较低级别的 reverse_nested 聚合的 doc_count 对顶级聚合存储桶进行排序。

我的索引是这样创建的:

PUT /myindex
{
  "mappings": {
    "default": {
      "properties": {
        "items": {
          "type": "nested","properties": {
            "subitems": {
              "type": "nested","properties": {
                "id": {
                  "type": "long"
                },"name": {
                  "type": "keyword"
                }
              }
            }
          }
        },"name": {
          "type": "keyword"
        }
      }
    }
  }
}

这些是我编入索引的示例文档:

{
  "name": "Document #1","items": [
    {
      "subitems": [
        {
          "id": 1,"name": "Subitem #1"
        },{
          "id": 2,"name": "Subitem #2"
        }
      ]
    },{
      "subitems": [
        {
          "id": 2,"name": "Subitem #2"
        },{
          "id": 3,"name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #2","items": [
    {
      "subitems": [
        {
          "id": 2,"name": "Subitem #2"
        }
      ]
    }
  ]
}
{
  "name": "Document #3","items": [
    {
      "subitems": [
        {
          "id": 3,"name": "Subitem #3"
        }
      ]
    },"name": "Subitem #2"
        }
      ]
    }
  ]
}
{
  "name": "Document #4",{
          "id": 5,"name": "Subitem #5"
        }
      ]
    }
  ]
}
{
  "name": "Document #5","name": "Subitem #2"
        }
      ]
    }
  ]
}
{
  "name": "Document #6","name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #7","name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #8","name": "Subitem #3"
        }
      ]
    }
  ]
}
{
  "name": "Document #9","name": "Subitem #3"
        }
      ]
    }
  ]
}

我需要我的聚合才能提取包含每个子项 ID/名称对的文档数量。 (考虑子项 ID 始终对应于相同的子项名称)。 即:

id | name       | count
---+------------+------
2  | Subitem #2 | 5
3  | Subitem #3 | 6
1  | Subitem #1 | 1
5  | Subitem #5 | 1

这是原始的聚合查询

GET /myindex/default/_search
{
  "size": 0,"aggregations": {
    "my_nested_agg": {
      "nested": {
        "path": "items.subitems"
      },"aggregations": {
        "subitem_id": {
          "terms": {
            "field": "items.subitems.id"
          },"aggregations": {
            "subitem_name": {
              "terms": {
                "field": "items.subitems.name"
              },"aggregations": {
                "my_rev_agg": {
                  "reverse_nested": {}
                }
              }
            }
          }
        }
      }
    }
  }
}

聚合似乎返回了我需要的所有数据:

{
  "took": 0,"timed_out": false,"_shards": {
    "total": 5,"successful": 5,"skipped": 0,"Failed": 0
  },"hits": {
    "total": 9,"max_score": 0.0,"hits": []
  },"aggregations": {
    "my_nested_agg": {
      "doc_count": 19,"subitem_id": {
        "doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [
          {
            "key": 2,"doc_count": 11,"subitem_name": {
              "doc_count_error_upper_bound": 0,"buckets": [
                {
                  "key": "Subitem #2","my_rev_agg": {
                    "doc_count": 5
                  }
                }
              ]
            }
          },{
            "key": 3,"doc_count": 6,"buckets": [
                {
                  "key": "Subitem #3","my_rev_agg": {
                    "doc_count": 6
                  }
                }
              ]
            }
          },{
            "key": 1,"doc_count": 1,"buckets": [
                {
                  "key": "Subitem #1","my_rev_agg": {
                    "doc_count": 1
                  }
                }
              ]
            }
          },{
            "key": 5,"buckets": [
                {
                  "key": "Subitem #5","my_rev_agg": {
                    "doc_count": 1
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

但是,存储桶根据“subitem_id”子聚合的 doc_count 按降序排列。

相反,我需要根据 reverse_nested 子聚合的 doc_count 按降序对存储桶进行排序。像这样:

id | name       | count
---+------------+------
3  | Subitem #3 | 6
2  | Subitem #2 | 5
1  | Subitem #1 | 1
5  | Subitem #5 | 1

我尝试通过以下查询实现此目的:

GET /myindex/default/_search
{
  "size": 0,"aggregations": {
        "subitem_id": {
          "terms": {
            "field": "items.subitems.id","order": [
              {
                "subitem_name>my_rev_agg._count": "desc"
              }
            ]
          },"aggregations": {
                "my_rev_agg": {
                  "reverse_nested": {}
                }
              }
            }
          }
        }
      }
    }
  }
}

但后来我得到了错误

无效的聚合订单路径 [subitem_name>my_rev_agg._count]。存储桶只能在由路径内的零个或多个单存储桶聚合以及路径末端的最终单存储桶或指标聚合构建的子聚合器路径上排序。子路径 [subitem_name] 指向非单桶聚合

能否请您指教。 预先非常感谢您。

解决方法

我找到了一个可以满足我的要求的解决方案。关键是将 reverse_nested 聚合移到用于检索名称的术语子聚合之外:

GET /myindex/default/_search
{
  "size": 0,"aggregations": {
    "my_nested_agg": {
      "nested": {
        "path": "items.subitems"
      },"aggregations": {
        "subitem_id": {
          "terms": {
            "field": "items.subitems.id","order": [
              {
                "my_rev_agg": "desc"
              }
            ]
          },"aggregations": {
            "subitem_name": {
              "terms": {
                "field": "items.subitems.name"
              }
            },"my_rev_agg": {
              "reverse_nested": {}
            }
          }
        }
      }
    }
  }
}

这将返回正确排序的子项桶:

{
  "took": 0,"timed_out": false,"_shards": {
    "total": 5,"successful": 5,"skipped": 0,"failed": 0
  },"hits": {
    "total": 9,"max_score": 0.0,"hits": []
  },"aggregations": {
    "my_nested_agg": {
      "doc_count": 19,"subitem_id": {
        "doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [
          {
            "key": 3,"doc_count": 6,"my_rev_agg": {
              "doc_count": 6
            },"subitem_name": {
              "doc_count_error_upper_bound": 0,"buckets": [
                {
                  "key": "Subitem #3","doc_count": 6
                }
              ]
            }
          },{
            "key": 2,"doc_count": 11,"my_rev_agg": {
              "doc_count": 5
            },"buckets": [
                {
                  "key": "Subitem #2","doc_count": 11
                }
              ]
            }
          },{
            "key": 1,"doc_count": 1,"my_rev_agg": {
              "doc_count": 1
            },"buckets": [
                {
                  "key": "Subitem #1","doc_count": 1
                }
              ]
            }
          },{
            "key": 5,"buckets": [
                {
                  "key": "Subitem #5","doc_count": 1
                }
              ]
            }
          }
        ]
      }
    }
  }
}