ElasticSearch:我们可以在索引编制过程中同时应用n-gram和语言分析器吗

问题描述

您可以基于语言分析器创建自定义分析器。唯一的区别是您将ngram_filter令牌过滤器添加到链的末尾。在这种情况下,您首先会获得语言梗阻的令牌(认链),这些令牌最终会转换为边缘ngram(您的过滤器)。您可以在这里https://www.elastic.co/guide/zh- cn/elasticsearch/reference/current/analysis-lang-analyzer.html#english- analyzer中找到语言分析器的实现,以覆盖它们。这是英语更改的示例:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "english_ngram": {
                    "type": "custom",
                    "filter": [
                        "english_possessive_stemmer",
                        "lowercase",
                        "english_stop",
                        "english_stemmer",
                        "ngram_filter"
                    ],
                    "tokenizer": "standard"
                }
            },
            "filter": {
                "english_stop": {
                    "type": "stop"
                },
                "english_stemmer": {
                    "type": "stemmer",
                    "language": "english"
                },
                "english_possessive_stemmer": {
                    "type": "stemmer",
                    "language": "possessive_english"
                },
                "ngram_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 25
                }
            }
        }
    }
}

支持特殊字符,您可以尝试使用whitespace标记符代替standard在这种情况下,这些字符将成为令牌的一部分:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "english_ngram": {
                    "type": "custom",
                    "filter": [
                        "english_possessive_stemmer",
                        "lowercase",
                        "english_stop",
                        "english_stemmer",
                        "ngram_filter"
                    ],
                    "tokenizer": "whitespace"
                }
            },
            "filter": {
                "english_stop": {
                    "type": "stop"
                },
                "english_stemmer": {
                    "type": "stemmer",
                    "language": "english"
                },
                "english_possessive_stemmer": {
                    "type": "stemmer",
                    "language": "possessive_english"
                },
                "ngram_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 25
                }
            }
        }
    }
}

解决方法

非常感谢@Random,我对映射进行了如下修改。为了进行测试,我使用“电影”作为索引类型。注意:我还添加了search_analyzer。没有那我没有得到适当的结果。但是我对使用search_analyzer有以下疑问。

1]如果是语言分析器,我们可以使用自定义search_analyzer吗?
2]我是否由于使用过的n-gram分析器而不是由于英语分析器而获得了所有结果?

{
    "settings": {
        "analysis": {
            "analyzer": {
                "english_ngram": {
                    "type": "custom","filter": [
                        "english_possessive_stemmer","lowercase","english_stop","english_stemmer","ngram_filter"
                    ],"tokenizer": "whitespace"
                },"search_analyzer":{
                    "type": "custom","tokenizer": "whitespace","filter": "lowercase"
                }
            },"filter": {
                "english_stop": {
                    "type": "stop"
                },"english_stemmer": {
                    "type": "stemmer","language": "english"
                },"english_possessive_stemmer": {
                    "type": "stemmer","language": "possessive_english"
                },"ngram_filter": {
                    "type": "ngram","min_gram": 1,"max_gram": 25
                }
            }
        }
    },"mappings": {
    "movie": {
      "properties": {
        "title": {
          "type": "string","fields": {
            "en": {
              "type":     "string","analyzer": "english_ngram","search_analyzer": "search_analyzer"
            }
          }
        }
      }
    }
  }
}

更新:

使用搜索分析器也无法始终如一地工作,因此需要更多帮助。

我按照建议使用了以下映射(注意:此映射不使用搜索分析器),为简单起见,我们仅考虑英语分析器。

{
    "settings": {
        "analysis": {
            "analyzer": {
                "english_ngram": {
                    "type": "custom","tokenizer": "standard"
                }
            },"ngram_filter": {
                    "type": "edge_ngram","max_gram": 25
                }
            }
        }
    }
}

创建的索引:

放置http:// localhost:9200 / movies / movie /
1

{"title":"$peci@l movie"}

尝试以下查询:

GET http://localhost:9200/movies/movie/_search

    {
        "query": {
            "multi_match": {
                "query": "$peci mov","fields": ["title"],"operator": "and"
            }
            }
        }
    }

我没有结果,我做错了吗?我正在尝试获得以下结果:

1] Special characters
2] Partial matches
3] Space separated partial and full words

再次感谢 !