布尔相似度-是否有删除重复项的方法

问题描述

给出以下索引

PUT /test_index
{
    "mappings": {
        "properties": {
        "field1": { 
            "type": "text","analyzer": "whitespace","similarity": "boolean"
        },"field2": { 
            "type": "text","similarity": "boolean"
        }
        }
    }
}

以及以下数据

POST /test_index/_bulk?refresh=true
{ "index" : {} }
{ "field1": "foo","field2": "bar"}
{ "index" : {} }
{ "field1": "foo1 foo2","field2": "bar1 bar2"}
{ "index" : {} }
{ "field1": "foo1 foo2 foo3","field2": "bar1 bar2 bar3"}

用于给定的布尔相似性查询

POST /test_index/_search
{
    "size": 10,"min_score": 0.4,"query": {
        "function_score": {
        "query": {
            "bool": {
            "should": [
                {
                "fuzzy":{
                    "field1":{
                        "value":"foo","fuzziness":"AUTO","boost": 1
                    }
                }
            },{
                "fuzzy":{
                    "field2":{
                        "value":"bar","boost": 1
                    }
                }
            }
            ]
            }
        }
        }
    }
}

我总是收到[“ foo1 foo2 foo3”,“ bar1 bar2 bar3”],尽管事实上索引中有一个精确的结果(第一个):

{
    "took": 114,"timed_out": false,"_shards": {
        "total": 1,"successful": 1,"skipped": 0,"Failed": 0
    },"hits": {
        "total": {
            "value": 3,"relation": "eq"
        },"max_score": 3.9999998,"hits": [
            {
                "_index": "test_index","_type": "_doc","_id": "bXw8eXUBCTtfNv84bNPr","_score": 3.9999998,"_source": {
                    "field1": "foo1 foo2 foo3","field2": "bar1 bar2 bar3"
                }
            },{
                "_index": "test_index","_id": "bHw8eXUBCTtfNv84bNPr","_score": 2.6666665,"_source": {
                    "field1": "foo1 foo2","field2": "bar1 bar2"
                }
            },"_id": "a3w8eXUBCTtfNv84bNPr","_score": 2.0,"_source": {
                    "field1": "foo","field2": "bar"
                }
            }
        ]
    }
}

我知道Boolean可以匹配尽可能多的结果的事实,我知道我可以在这里进行记录,但这不是一个选择,因为我不知道要提取多少前N个结果。

这里还有其他选择吗?也许可以根据布尔相似性创建我自己的相似性插件,以删除重复项并保留最匹配的标记,但是我不知道从哪里开始,我只看到脚本和重新评分的示例。

解决方法

更新:-根据我先前答案的注释部分所提供的清晰度,来更新答案。

以下查询返回预期结果

ReactPlayer

和搜索结果

{
    "min_score": 0.4,"size":10,"query": {
        "function_score": {
            "query": {
                "bool": {
                    "should": [
                        {
                            "fuzzy": {
                                "field1": {
                                    "value": "foo","fuzziness": "AUTO","boost": 0.5
                                }
                            }
                        },{
                            "term": { --> used for boosting the exact terms
                                "field1": {
                                    "value": "foo","boost": 1.5 --> further boosting the exact match.
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

另一个不带确切术语的查询也会返回预期结果

"hits": [
            {
                "_index": "test_index","_type": "_doc","_id": "zdMEvHUBlo4-1mHbtvNH","_score": 2.0,"_source": {
                    "field1": "foo","field2": "bar"
                }
            },{
                "_index": "test_index","_id": "z9MEvHUBlo4-1mHbtvNH","_score": 0.99999994,"_source": {
                    "field1": "foo1 foo2 foo3","field2": "bar1 bar2 bar3"
                }
            },"_id": "ztMEvHUBlo4-1mHbtvNH","_score": 0.6666666,"_source": {
                    "field1": "foo1 foo2","field2": "bar1 bar2"
                }
            }
        ]

和搜索结果

{
    "min_score": 0.4,{
                            "term": {
                                "field1": {
                                    "value": "foo" --> notice there is no boost
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...