弹性搜索边缘ngram不返回所有预期结果

问题描述

我很难找到弹性搜索查询的意外结果。将以下文档编入索引以进行弹性搜索

{
"group": "J00-I99",codes: [
   { "id": "J15","description": "hello world" },{ "id": "J15.0","description": "test one world" },{ "id": "J15.1","description": "test two world J15.0" },{ "id": "J15.2","description": "test two three world J15" },{ "id": "J15.3","description": "hello world J18 " },............................ // Similar records here
   { "id": "J15.9","description": "hello world new" },{ "id": "J16.0","description": "new description" }
]
}

在这里,我的目标是实现自动完成功能,为此,我使用了n-gram方法。我不想使用完整的建议方法

目前,我遇到两个问题:

  1. 搜索查询(ID和说明字段):J15

预期结果:以上所有结果,其中包括J15 实际结果:仅获得很少的结果(J15.0,J15.1,J15.8)

  1. 搜索查询(ID和描述字段):测试两个

预期结果:

{ "id": "J15.1",

实际结果:

   { "id": "J15.0",

然后完成映射。

           {

                settings: {
                    number_of_shards: 1,analysis: {
                        filter: {
                            ngram_filter: {
                                type: 'edge_ngram',min_gram: 2,max_gram: 20
                            }
                        },analyzer: {
                            ngram_analyzer: {
                                type: 'custom',tokenizer: 'standard',filter: [
                                    'lowercase','ngram_filter'
                                ]
                            }
                        }
                    }
                },mappings: {
                    properties: {
                        group: {
                            type: 'text'
                        },codes: {
                            type: 'nested',properties: {
                                id: {
                                    type: 'text',analyzer: 'ngram_analyzer',search_analyzer: 'standard'
                                },description: {
                                    type: 'text',search_analyzer: 'standard'
                                }
                            }
                        }
                    }
                }
            }

搜索查询

GET myindex/_search
{
  "_source": {
    "excludes": [
      "codes"
    ]
  },"query": {
    "nested": {
      "path": "codes","query": {
        "bool": {
          "should": [
            {
              "match": {
                "codes.description": "J15"
              }
            },{
              "match": {
                "codes.id": "J15"
              }
            }
          ]
        }
      },"inner_hits": {}
    }
  }
}

注意:文档索引将很大。这里仅提及示例数据。

对于第二个问题,我可以将multi_match与如下所示的AND运算符一起使用吗?

GET myindex/_search
{
  "_source": {
    "excludes": [
      "codes"
    ]
  },"query": {
        "bool": {
          "should": [
            {
              "multi_match": {
                    "query": "J15","fields": ["codes.id","codes.description"],"operator": and
                }
            }
          ]
        }
      },"inner_hits": {}
    }
  }
}

由于我在解决此问题上遇到困难,因此我们将不胜感激。

解决方法

问题是,默认情况下,inner_hits仅返回this official doc中提到的3个匹配文档,

大小

每个inner_hits返回的最大匹配数。 默认情况下, 返回前三个匹配项。

只需在您的inner_hits中添加size参数即可获得所有搜索结果。

  "inner_hits": {
                "size": 10 // note this
            }

在示例数据中进行了尝试,并看到了第一个查询的搜索结果,该查询仅返回3个搜索结果

第一个查询搜索结果

   "hits": [
                                {
                                    "_index": "myindexedge64170045","_type": "_doc","_id": "1","_nested": {
                                        "field": "codes","offset": 2
                                    },"_score": 1.8687118,"_source": {
                                        "id": "J15.1","description": "test two world J15.0"
                                    }
                                },{
                                    "_index": "myindexedge64170045","offset": 3
                                    },"_score": 1.7934312,"_source": {
                                        "id": "J15.2","description": "test two three world J15"
                                    }
                                },"offset": 0
                                    },"_score": 0.29618382,"_source": {
                                        "id": "J15","description": "hello world"
                                    }
                                },"offset": 1
                                    },"_source": {
                                        "id": "J15.0","description": "test one world"
                                    }
                                },"offset": 4
                                    },"_source": {
                                        "id": "J15.3","description": "hello world J18 "
                                    }
                                },"offset": 5
                                    },"_source": {
                                        "id": "J15.9","description": "hello world new"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
,

添加另一个答案,因为它是另一个问题,而第一个答案则集中在第一个问题上。

问题是您的第二个查询test two返回了test one world,并且在索引时您使用的是ngram_analyzer,而该{使用的是标准分析器,该分析器将文本分割为白色,空格,并且您的搜索分析器再次为standard,因此,如果在索引文档和搜索词上使用Analyze API,您将看到它与标记匹配:

{
   "text" : "test one world","analyzer" : "standard"
}

并生成令牌

{
    "tokens": [
        {
            "token": "test","start_offset": 0,"end_offset": 4,"type": "<ALPHANUM>","position": 0
        },{
            "token": "one","start_offset": 5,"end_offset": 8,"position": 1
        },{
            "token": "world","start_offset": 9,"end_offset": 14,"position": 2
        }
    ]
}

对于您的搜索字词test two

{
    "tokens": [
        {
            "token": "test",{
            "token": "two","position": 1
        }
    ]
}

如您所见,文档中存在test令牌,因此您可以获得该搜索结果。可以通过在查询中使用AND运算符来解决此问题,如下所示

搜索查询

{
    "_source": {
        "excludes": [
            "codes"
        ]
    },"query": {
        "nested": {
            "path": "codes","query": {
                "bool": {
                    "must": {
                        "multi_match": {
                            "query": "test two","fields": [
                                "codes.id","codes.description"
                            ],"operator" :"AND"
                        }
                    }
                }
            },"inner_hits": {}
        }
    }
}

和搜索结果

 "hits": [
                                {
                                    "_index": "myindexedge64170045","_score": 2.6901608,"_score": 2.561376,"description": "test two three world J15"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
,

添加带有索引映射,搜索查询和搜索结果的工作示例

索引映射:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },"tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram","min_gram": 2,"max_gram": 20,"token_chars": [
            "letter","digit"
          ]
        }
      }
    },"max_ngram_diff": 50
  },"mappings": {
    "properties": {
      "group": {
        "type": "text"
      },"codes": {
        "type": "nested","properties": {
          "id": {
            "type": "text","analyzer": "my_analyzer"
          }
        }
      }
    }
  }
}

索引数据:

{
    "group": "J00-I99","codes": [
        {
            "id": "J15","description": "hello world"
        },{
            "id": "J15.0","description": "test one world"
        },{
            "id": "J15.1","description": "test two world J15.0"
        },{
            "id": "J15.2","description": "test two three world J15"
        },{
            "id": "J15.3","description": "hello world J18 "
        },{
            "id": "J15.9","description": "hello world new"
        },{
            "id": "J16.0","description": "new description"
        }
    ]
}

搜索查询:

{
    "_source": {
        "excludes": [
            "codes"
        ]
    },"query": {
                "bool": {
                    "should": [
                        {
                            "match": {
                                "codes.description": "J15"
                            }
                        },{
                            "match": {
                                "codes.id": "J15"
                            }
                        }
                    ],"must": {
                        "multi_match": {
                            "query": "test two","type": "phrase"
                        }
                    }
                }
            },"inner_hits": {}
        }
    }
}

搜索结果:

"inner_hits": {
          "codes": {
            "hits": {
              "total": {
                "value": 2,"relation": "eq"
              },"max_score": 3.2227304,"hits": [
                {
                  "_index": "stof_64170045","_nested": {
                    "field": "codes","offset": 3
                  },"_score": 3.2227304,"_source": {
                    "id": "J15.2","description": "test two three world J15"
                  }
                },{
                  "_index": "stof_64170045","offset": 2
                  },"_score": 2.0622847,"_source": {
                    "id": "J15.1","description": "test two world J15.0"
                  }
                }
              ]
            }
          }
        }
      }

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...