问题描述
我正在使用django_elasticsearch_dsl。
我的文档:
html_strip = analyzer(
'html_strip',tokenizer='standard',filter=["lowercase","stop","snowball"],char_filter=["html_strip"]
)
class Document(django_elasticsearch_dsl.Document):
name = TextField(
analyzer=html_strip,fields={
'raw': fields.KeywordField(),'suggest': fields.CompletionField(),}
)
...
我的请求:
_search = Document.search().suggest("suggestions",text=query,completion={'field': 'name.suggest'}).execute()
我为以下文档“名称”编制了索引:
"This is a test"
"this is my test"
"this test"
"Test this"
现在,如果搜索This is my text
仅会收到
"this is my text"
但是,如果我搜索test
,那么我得到的只是
"Test this"
即使我需要所有文档,但它们的名称中都有test
。
我想念什么?
解决方法
完成建议的最佳方式,可以与的中间匹配 字段是n-gram过滤器。
您可以使用多个建议,其中一个建议基于前缀,对于字段中间的匹配,可以使用正则表达式。
我不知道django_elasticsearch_dsl,添加了一个包含索引映射,数据,搜索查询和搜索结果的有效示例
索引映射:
{
"mappings": {
"properties": {
"name": {
"type": "completion"
}
}
}
}
索引数据:
{
"name": {
"input": ["Test this"]
}
}
{
"name": {
"input": ["this is my test"]
}
}
{
"name": {
"input": ["This is a test"]
}
}
{
"name": {
"input": ["this test"]
}
}
搜索查询:
{
"suggest": {
"suggest-exact": {
"prefix": "test","completion": {
"field": "name","skip_duplicates": true
}
},"suggest-regex": {
"regex": ".*test.*","skip_duplicates": true
}
}
}
}
搜索结果:
"suggest": {
"suggest-exact": [
{
"text": "test","offset": 0,"length": 4,"options": [
{
"text": "Test this","_index": "stof_64281341","_type": "_doc","_id": "4","_score": 1.0,"_source": {
"name": {
"input": [
"Test this"
]
}
}
}
]
}
],"suggest-regex": [
{
"text": ".*test.*","length": 8,"_source": {
"name": {
"input": [
"Test this"
]
}
}
},{
"text": "This is a test","_id": "1","_source": {
"name": {
"input": [
"This is a test"
]
}
}
},{
"text": "this is my test","_id": "2","_source": {
"name": {
"input": [
"this is my test"
]
}
}
},{
"text": "this test","_id": "3","_source": {
"name": {
"input": [
"this test"
]
}
}
}
]
}
,
根据用户的评论,使用ngrams添加另一个答案
添加带有索引映射,索引数据,搜索查询和搜索结果的工作示例
索引映射:
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram","min_gram": 4,"max_gram": 20
}
},"analyzer": {
"ngram_analyzer": {
"type": "custom","tokenizer": "standard","filter": [
"lowercase","ngram_filter"
]
}
}
},"max_ngram_diff": 50
},"mappings": {
"properties": {
"name": {
"type": "text","analyzer": "ngram_analyzer","search_analyzer": "standard"
}
}
}
}
索引数据:
{
"name": [
"Test this"
]
}
{
"name": [
"This is a test"
]
}
{
"name": [
"this is my test"
]
}
{
"name": [
"this test"
]
}
分析API:
POST/_analyze
{
"analyzer" : "ngram_analyzer","text" : "this is my test"
}
会生成以下令牌:
{
"tokens": [
{
"token": "this","start_offset": 0,"end_offset": 4,"type": "<ALPHANUM>","position": 0
},{
"token": "test","start_offset": 11,"end_offset": 15,"position": 3
}
]
}
搜索查询:
{
"query": {
"match": {
"name": "test"
}
}
}
搜索结果:
"hits": [
{
"_index": "stof_64281341","_score": 0.2876821,"_source": {
"name": [
"Test this"
]
}
},{
"_index": "stof_64281341","_source": {
"name": [
"this is my test"
]
}
},"_source": {
"name": [
"This is a test"
]
}
},"_source": {
"name": [
"this test"
]
}
}
]
对于模糊搜索,您可以使用以下搜索查询:
{
"query": {
"fuzzy": {
"name": {
"value": "tst" <-- used tst in place of test
}
}
}
}