Elasticsearch ngram令牌生成器返回所有结果,无论查询输入如何

问题描述

我正在尝试建立查询搜索以下格式的记录:TR000002_1_2020

用户应该可以通过以下方式搜索结果

TR0000022_1_2020TR000002_1_20202020。我认为ngram标记查询最适合我的需求。我正在使用Elasticsearch 6.8,所以无法使用E7中引入的内置“按类型输入”功能

这是我从文档here开始的实现。我唯一修改的是EdgeNGram-> NGram,因为用户可以从文本的任何点进行搜索

我的分析块如下:

.Analysis(a => a
    .Analyzers(aa => aa
        .Custom("autocomplete",ca => ca
            .Tokenizer("autocomplete")
            .Filters(new string[] {
                "lowercase"
            })
        )
        .Custom("autocomplete_search",ca => ca
            .Tokenizer("lowercase")
        )
    )
    .Tokenizers(t => t
        .NGram("autocomplete",e => e
            .MinGram(2)
            .MaxGram(16)
            .TokenChars(new TokenChar[] {
                TokenChar.Letter,TokenChar.Digit,TokenChar.Punctuation,TokenChar.Symbol
            })
        )
    )
)

然后在我的映射中定义:

.Text(t => t
    .Name(tr => tr.TestRecordId)
    .Analyzer("autocomplete")
    .SearchAnalyzer("autocomplete_search")
)

当我搜索TR000002时,查询将返回所有结果,而不仅仅是包含这些特定字符的记录。我究竟做错了什么?对于此特定用例,是否有更好的令牌生成器?谢谢!

编辑:这是返回的示例:

{
  "took" : 5,"timed_out" : false,"_shards" : {
    "total" : 5,"successful" : 5,"skipped" : 0,"Failed" : 0
  },"hits" : {
    "total" : 27,"max_score" : 0.105360515,"hits" : [
      {
        "_index" : "test-records-development-09-09-2020-02-00-00","_type" : "testrecorddto","_id" : "3","_score" : 0.105360515,"_source" : {
          "id" : 3,"testRecordId" : "TR000002_1_2020","type" : 0,"typeName" : "TIDCo60","missionId" : 1,"mission" : {
            "missionId" : 1,"name" : "[REDACTED]","mRPLUsername" : "[REDACTED]","missionRadiationPartsLead" : {
              "username" : "[REDACTED]","displayName" : "[REDACTED]"
            },"missionInstruments" : [
              {
                "missionId" : 1,"instrumentId" : 1,"cognizantEngineerUsername" : "[REDACTED]","instrument" : {
                  "intstrumentId" : 1,"name" : "Instrument"
                },"cognizantEngineer" : {
                  "username" : "[REDACTED]","displayName" : "[REDACTED]"
                }
              },{
                "missionId" : 1,"instrumentId" : 2,"instrument" : {
                  "intstrumentId" : 2,"name" : "Instrument 2"
                }
              }
            ]
          },"procurementPartId" : 2,"procurementPart" : {
            "procurementPartId" : 2,"partNumber" : "procurement part","part" : {
              "partId" : 1,"manufacturer" : "Texas Instruments","genericPartNumber" : "123","description" : "description","partTechnology" : "Part Tech"
            }
          },"testStatusId" : 12,"testStatus" : {
            "testStatusId" : 12,"name" : "Complete: Postponed Until Further Notice"
          },"discriminator" : "SingleEventEffectsRecord","testRecordServiceOrders" : [
            {
              "testRecordId" : 3,"serviceOrderId" : 9,"serviceOrder" : {
                "serviceOrderId" : 9,"serviceOrderNumber" : "105702"
              }
            }
          ],"rtdbFiles" : [ ],"personnelGroups" : [
            {
              "personnelGroupUsers" : [ ]
            },{
              "personnelGroupUsers" : [ ]
            }
          ],"testRecordTestSubTypes" : [ ],"testRecordTestFacilityConditions" : [ ],"testRecordFollowers" : [ ],"isDeleted" : false,"sEETestRates" : [ ]
        }
      },{
        "_index" : "test-records-development-09-09-2020-02-00-00","_id" : "11","_source" : {
          "id" : 11,"testRecordId" : "TR000011_1_2020","testStatusId" : 1,"testStatus" : {
            "testStatusId" : 1,"name" : "Active"
          },"discriminator" : "TotalIonizingDoseRecord","creatorUsername" : "[REDACTED]","creator" : {
            "username" : "[REDACTED]","displayName" : "[REDACTED]"
          },"testRecordServiceOrders" : [ ],"partLDC" : "12","waferLot" : "1","personnelGroups" : [
            {
              "personnelGroupUsers" : [ ]
            }
          ],"testStartDate" : "2020-07-30T00:00:00","actualCompletionDate" : "2020-07-31T00:00:00"
        }
      },"_id" : "17","_source" : {
          "id" : 17,"testRecordId" : "TR000017_1_2020","cognizantEngineer" : {
                  "username" : "lewallen","isDeleted" : false
        }
      },

这也是mapping显示内容

"testRecordId" : {
  "type" : "text","analyzer" : "autocomplete","search_analyzer" : "autocomplete_search"
},

我想我也应该提到,我已经在控制台中像这样测试该查询

GET test-records-development/_search
{
  "query": {
    "match": {
      "testRecordId": {
        "query": "TR000002_1_2020"
      }
    }
  }
}

编辑2:从索引_settings端点添加了API响应:

{
  "test-records-development-09-09-2020-02-00-00" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "5","provided_name" : "test-records-development-09-09-2020-02-00-00","creation_date" : "1599617013874","analysis" : {
          "analyzer" : {
            "autocomplete" : {
              "filter" : [
                "lowercase"
              ],"type" : "custom","tokenizer" : "autocomplete"
            },"autocomplete_search" : {
              "type" : "custom","tokenizer" : "lowercase"
            }
          },"tokenizer" : {
            "autocomplete" : {
              "token_chars" : [
                "letter","digit","punctuation","symbol"
              ],"min_gram" : "2","type" : "ngram","max_gram" : "16"
            }
          }
        },"number_of_replicas" : "0","uuid" : "FSeCa0YwRCOJVbjfxYGkig","version" : {
          "created" : "6080199"
        }
      }
    }
  }
}

解决方法

由于我没有使用JSON格式的分析器设置访问权限,因此我无法确认,但是最可能的问题是您的搜索分析器autocomplete_search正在创建与索引时间标记匹配的搜索时间标记

例如:您正在搜索TR000002_1_2020,如果它创建2020作为令牌,并且包含TR000011_1_2020的文档也创建了2020令牌,则查询将匹配它。

您可以使用analyze API来基于分析器检查生成的令牌,如前所述,大多数情况下都存在一些匹配的令牌,如上所示。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...