问题描述
当我进行搜索查询时,我正在努力忽略重音和复数/单数。我从这里复制了西班牙语分析器,只留下了词干分析器 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
您可以在 Python 中检查我的代码(我从 CSV 后者批量处理数据):
settings={
"settings": {
"analysis": {
"filter": {
"spanish_stemmer": {
"type": "stemmer","language": "light_spanish"
}
},"analyzer": {
"rebuilt_spanish": {
"tokenizer": "standard","filter": [
"lowercase","spanish_stemmer"
]
}
}
}
}
}
es.indices.create(index="activities",body=settings)
但是,当我尝试从像 geometrico
,geométrico
,geométricos
,geometricos
这样的 insomnia 进行 GET 查询时,我得到 0 个结果,并且有一个标题为 {{ 1}}。它应该匹配,因为我想不区分重音和复数单数。有什么想法吗?
我执行的 GET 查询:
Cuerpos geométricos
解决方法
您需要将 ASCII folding token filter
添加到您的令牌过滤器中,请查看官方文档 here。所以你的 Analyzer
应该是这样的:
分析器:
"analysis": {
"filter": {
"spanish_stemmer": {
"type": "stemmer","language": "light_spanish"
}
},"analyzer": {
"rebuilt_spanish": {
"tokenizer": "standard","filter": [
"asciifolding",// ASCII folding token filter
"lowercase","spanish_stemmer"
]
}
}
}
}