问题描述
我是Elasticsearch的新手,当我需要与多个搜索词匹配以及与嵌套文档匹配时,查询速度很慢,由于以下原因,基本上第一次查询需要7-10秒,之后需要5-6秒elasticsearch高速缓存,但是查询具有匹配的非嵌套对象的速度很快,即100ms之内。
我在具有250GB RAM和500GB磁盘空间的AWS实例中运行弹性搜索,我有一个模板和204个索引,在单个节点中总共索引了约107百万个文档,每个索引包含2个分片,并且我保持了30GB堆大小。
我可以嵌套的对象超过50k,所以我将其长度增加到500k,在此嵌套对象上进行搜索会花费太多时间,除嵌套之外的其他字段上的任何OR(应该匹配)操作也都需要花费时间,有什么办法我可以提高嵌套对象的查询性能吗?还是我的配置有问题? 还有什么方法可以使首次查询也更快?
{
"index_patterns": [
"product_*"
],"template": {
"settings": {
"index.store.type": "mmapfs","number_of_shards":2,"number_of_replicas": 0,"index": {
"store.preload": [
"*"
],"mapping.nested_objects.limit": 500000,"analysis": {
"analyzer": {
"cust_product_name": {
"type": "custom","tokenizer": "standard","filter": [
"lowercase","english_stop","name_wordforms","business_wordforms","english_stemmer","min_value"
],"char_filter": [
"html_strip"
]
},"entity_name": {
"type": "custom","english_stemmer"
],"cust_text": {
"type": "custom","char_filter": [
"html_strip"
]
}
},"filter": {
"min_value": {
"type": "length","min": 2
},"english_stop": {
"type": "stop","stopwords": "_english_"
},"business_wordforms": {
"type": "synonym","synonyms_path": "<some path>/business_wordforms.txt"
},"name_wordforms": {
"type": "synonym","synonyms_path": "<some path>/name_wordforms.txt"
},"english_stemmer": {
"type": "stemmer","language": "english"
}
}
}
}
},"mappings": {
"dynamic": "strict","properties": {
"product_number": {
"type": "text","analyzer": "keyword"
},"product_name": {
"type": "text","analyzer": "cust_case_name"
},"first_fetch_date": {
"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
},"last_fetch_date": {
"type": "date","review": {
"type": "nested","properties": {
"text": {
"type": "text","analyzer": "cust_text"
},"review_date": {
"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
}
}
}
}
},"aliases": {
"all_products": {}
}
},"priority": 200,"version": 1,}
{
"_source":{
"excludes":["review"]
},"size":1,"track_total_hits":true,"query":{
"nested":{
"path":"review","query":{
"match":{
"review.text":{
"query":"good","zero_terms_query":"none"
}
}
}
}
},"highlight":{
"pre_tags":[
"<b>"
],"post_tags":[
"</b>"
],"fields":{
"product_name":{
}
}
}
}
我确定我缺少明显的东西。
解决方法
简单的事情:track_total_hits应该设置为false。 进行强制合并的维护也有帮助。
fisrt与下一个请求时间之间的差异是由于elasticsearch缓存造成的。
但是,如果我的理解能力很好,那么您可以在一个文档上获得超过5万条评论? 如果是对的,那就太重要了。 您可以考虑反转映射吗? 具有将相关产品嵌入对象中的评论索引。它应该快得多。
PUT reviews
{
"mappings": {
"properties": {
"text": {
"type": "text"
},"review_date": {
"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
},"product": {
"properties": {
"product_number": {
"type": "text","analyzer": "keyword"
},"product_name": {
"type": "text"
},"first_fetch_date": {
"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
},"last_fetch_date": {
"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
}
}
}
}
}
}