ElasticSearch建议者全文搜索

问题描述

我正在使用django_elasticsearch_dsl。

我的文档:

html_strip = analyzer(
    'html_strip',tokenizer='standard',filter=["lowercase","stop","snowball"],char_filter=["html_strip"]
)

class Document(django_elasticsearch_dsl.Document):
    name = TextField(
        analyzer=html_strip,fields={
            'raw': fields.KeywordField(),'suggest': fields.CompletionField(),}
    )
    ...

我的请求:

_search = Document.search().suggest("suggestions",text=query,completion={'field': 'name.suggest'}).execute()

我为以下文档“名称”编制了索引:

"This is a test"
"this is my test"
"this test"
"Test this"

现在,如果搜索This is my text仅会收到

"this is my text"

但是,如果我搜索test,那么我得到的只是

"Test this"

即使我需要所有文档,但它们的名称中都有test

我想念什么?

解决方法

完成建议的最佳方式,可以与的中间匹配 字段是n-gram过滤器。

您可以使用多个建议,其中一个建议基于前缀,对于字段中间的匹配,可以使用正则表达式。

我不知道django_elasticsearch_dsl,添加了一个包含索引映射,数据,搜索查询和搜索结果的有效示例

索引映射:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "completion"
      }
    }
  }
}

索引数据:

{
  "name": {
    "input": ["Test this"]
  }
}
{
  "name": {
    "input": ["this is my test"]
  }
}
{
  "name": {
    "input": ["This is a test"]
  }
}
{
  "name": {
    "input": ["this test"]
  }
}

搜索查询:

    {
        "suggest": {
            "suggest-exact": {
                "prefix": "test","completion": {
                    "field": "name","skip_duplicates": true
                }
            },"suggest-regex": {
                "regex": ".*test.*","skip_duplicates": true
                }
            }
        }
    }

搜索结果:

"suggest": {
    "suggest-exact": [
      {
        "text": "test","offset": 0,"length": 4,"options": [
          {
            "text": "Test this","_index": "stof_64281341","_type": "_doc","_id": "4","_score": 1.0,"_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          }
        ]
      }
    ],"suggest-regex": [
      {
        "text": ".*test.*","length": 8,"_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          },{
            "text": "This is a test","_id": "1","_source": {
              "name": {
                "input": [
                  "This is a test"
                ]
              }
            }
          },{
            "text": "this is my test","_id": "2","_source": {
              "name": {
                "input": [
                  "this is my test"
                ]
              }
            }
          },{
            "text": "this test","_id": "3","_source": {
              "name": {
                "input": [
                  "this test"
                ]
              }
            }
          }
        ]
      }
,

根据用户的评论,使用ngrams添加另一个答案

添加带有索引映射,索引数据,搜索查询和搜索结果的工作示例

索引映射:

{
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "ngram","min_gram": 4,"max_gram": 20
        }
      },"analyzer": {
        "ngram_analyzer": {
          "type": "custom","tokenizer": "standard","filter": [
            "lowercase","ngram_filter"
          ]
        }
      }
    },"max_ngram_diff": 50
  },"mappings": {
    "properties": {
      "name": {
        "type": "text","analyzer": "ngram_analyzer","search_analyzer": "standard"
      }
    }
  }
}

索引数据:

{
  "name": [
    "Test this"
  ]
}

{
  "name": [
    "This is a test"
  ]
}

{
  "name": [
    "this is my test"
  ]
}

{
  "name": [
    "this test"
  ]
}

分析API:

POST/_analyze

{
  "analyzer" : "ngram_analyzer","text" : "this is my test"
}

会生成以下令牌:

{
  "tokens": [
    {
      "token": "this","start_offset": 0,"end_offset": 4,"type": "<ALPHANUM>","position": 0
    },{
      "token": "test","start_offset": 11,"end_offset": 15,"position": 3
    }
  ]
}

搜索查询:

{
    "query": {
        "match": {
           "name": "test"
        }
    }
}

搜索结果:

"hits": [
      {
        "_index": "stof_64281341","_score": 0.2876821,"_source": {
          "name": [
            "Test this"
          ]
        }
      },{
        "_index": "stof_64281341","_source": {
          "name": [
            "this is my test"
          ]
        }
      },"_source": {
          "name": [
            "This is a test"
          ]
        }
      },"_source": {
          "name": [
            "this test"
          ]
        }
      }
    ]

对于模糊搜索,您可以使用以下搜索查询:

{
  "query": {
    "fuzzy": {
      "name": {
        "value": "tst"    <-- used tst in place of test
      }
    }
  }
}

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...