问题描述
我正在尝试建立查询来搜索以下格式的记录:TR000002_1_2020
。
TR000002
或2_1_2020
或TR000002_1_2020
或2020
。我认为ngram标记化查询最适合我的需求。我正在使用Elasticsearch 6.8,所以无法使用E7中引入的内置“按类型输入”功能。
这是我从文档here开始的实现。我唯一修改的是EdgeNGram-> NGram,因为用户可以从文本的任何点进行搜索。
我的分析块如下:
.Analysis(a => a
.Analyzers(aa => aa
.Custom("autocomplete",ca => ca
.Tokenizer("autocomplete")
.Filters(new string[] {
"lowercase"
})
)
.Custom("autocomplete_search",ca => ca
.Tokenizer("lowercase")
)
)
.Tokenizers(t => t
.NGram("autocomplete",e => e
.MinGram(2)
.MaxGram(16)
.TokenChars(new TokenChar[] {
TokenChar.Letter,TokenChar.Digit,TokenChar.Punctuation,TokenChar.Symbol
})
)
)
)
然后在我的映射中定义:
.Text(t => t
.Name(tr => tr.TestRecordId)
.Analyzer("autocomplete")
.SearchAnalyzer("autocomplete_search")
)
当我搜索TR000002
时,查询将返回所有结果,而不仅仅是包含这些特定字符的记录。我究竟做错了什么?对于此特定用例,是否有更好的令牌生成器?谢谢!
编辑:这是返回的示例:
{
"took" : 5,"timed_out" : false,"_shards" : {
"total" : 5,"successful" : 5,"skipped" : 0,"Failed" : 0
},"hits" : {
"total" : 27,"max_score" : 0.105360515,"hits" : [
{
"_index" : "test-records-development-09-09-2020-02-00-00","_type" : "testrecorddto","_id" : "3","_score" : 0.105360515,"_source" : {
"id" : 3,"testRecordId" : "TR000002_1_2020","type" : 0,"typeName" : "TIDCo60","missionId" : 1,"mission" : {
"missionId" : 1,"name" : "[REDACTED]","mRPLUsername" : "[REDACTED]","missionRadiationPartsLead" : {
"username" : "[REDACTED]","displayName" : "[REDACTED]"
},"missionInstruments" : [
{
"missionId" : 1,"instrumentId" : 1,"cognizantEngineerUsername" : "[REDACTED]","instrument" : {
"intstrumentId" : 1,"name" : "Instrument"
},"cognizantEngineer" : {
"username" : "[REDACTED]","displayName" : "[REDACTED]"
}
},{
"missionId" : 1,"instrumentId" : 2,"instrument" : {
"intstrumentId" : 2,"name" : "Instrument 2"
}
}
]
},"procurementPartId" : 2,"procurementPart" : {
"procurementPartId" : 2,"partNumber" : "procurement part","part" : {
"partId" : 1,"manufacturer" : "Texas Instruments","genericPartNumber" : "123","description" : "description","partTechnology" : "Part Tech"
}
},"testStatusId" : 12,"testStatus" : {
"testStatusId" : 12,"name" : "Complete: Postponed Until Further Notice"
},"discriminator" : "SingleEventEffectsRecord","testRecordServiceOrders" : [
{
"testRecordId" : 3,"serviceOrderId" : 9,"serviceOrder" : {
"serviceOrderId" : 9,"serviceOrderNumber" : "105702"
}
}
],"rtdbFiles" : [ ],"personnelGroups" : [
{
"personnelGroupUsers" : [ ]
},{
"personnelGroupUsers" : [ ]
}
],"testRecordTestSubTypes" : [ ],"testRecordTestFacilityConditions" : [ ],"testRecordFollowers" : [ ],"isDeleted" : false,"sEETestRates" : [ ]
}
},{
"_index" : "test-records-development-09-09-2020-02-00-00","_id" : "11","_source" : {
"id" : 11,"testRecordId" : "TR000011_1_2020","testStatusId" : 1,"testStatus" : {
"testStatusId" : 1,"name" : "Active"
},"discriminator" : "TotalIonizingDoseRecord","creatorUsername" : "[REDACTED]","creator" : {
"username" : "[REDACTED]","displayName" : "[REDACTED]"
},"testRecordServiceOrders" : [ ],"partLDC" : "12","waferLot" : "1","personnelGroups" : [
{
"personnelGroupUsers" : [ ]
}
],"testStartDate" : "2020-07-30T00:00:00","actualCompletionDate" : "2020-07-31T00:00:00"
}
},"_id" : "17","_source" : {
"id" : 17,"testRecordId" : "TR000017_1_2020","cognizantEngineer" : {
"username" : "lewallen","isDeleted" : false
}
},
"testRecordId" : {
"type" : "text","analyzer" : "autocomplete","search_analyzer" : "autocomplete_search"
},
我想我也应该提到,我已经在控制台中像这样测试该查询:
GET test-records-development/_search
{
"query": {
"match": {
"testRecordId": {
"query": "TR000002_1_2020"
}
}
}
}
编辑2:从索引_settings
端点添加了API响应:
{
"test-records-development-09-09-2020-02-00-00" : {
"settings" : {
"index" : {
"number_of_shards" : "5","provided_name" : "test-records-development-09-09-2020-02-00-00","creation_date" : "1599617013874","analysis" : {
"analyzer" : {
"autocomplete" : {
"filter" : [
"lowercase"
],"type" : "custom","tokenizer" : "autocomplete"
},"autocomplete_search" : {
"type" : "custom","tokenizer" : "lowercase"
}
},"tokenizer" : {
"autocomplete" : {
"token_chars" : [
"letter","digit","punctuation","symbol"
],"min_gram" : "2","type" : "ngram","max_gram" : "16"
}
}
},"number_of_replicas" : "0","uuid" : "FSeCa0YwRCOJVbjfxYGkig","version" : {
"created" : "6080199"
}
}
}
}
}
解决方法
由于我没有使用JSON格式的分析器设置访问权限,因此我无法确认,但是最可能的问题是您的搜索分析器autocomplete_search
正在创建与索引时间标记匹配的搜索时间标记
例如:您正在搜索TR000002_1_2020
,如果它创建2020
作为令牌,并且包含TR000011_1_2020
的文档也创建了2020
令牌,则查询将匹配它。
您可以使用analyze API来基于分析器检查生成的令牌,如前所述,大多数情况下都存在一些匹配的令牌,如上所示。