问题描述
我想使用english_stem词典和简单词典对postgresql列执行全文搜索。我可以做这样的事情:
ALTER TEXT SEARCH CONfigURATION english_simple_conf
ALTER MAPPING FOR asciiword,asciihword,hword_asciipart,word,hword,hword_part
WITH english_stem,simple;
但这会检查单词是否同时在和词典中。有没有办法更改此配置,以便将该单词与一个词典或另一个词典匹配?
编辑:
我认为未按顺序检查它们的原因是,当搜索应在简单词典中找到的部分单词时,不会返回任何内容。
select * from ts_debug('english','gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+----------------+--------------+----------
asciiword | Word,all ASCII | gutter | {english_stem} | english_stem | {gutter}
blank | Space symbols | | {} | |
asciiword | Word,all ASCII | cleaning | {english_stem} | english_stem | {clean}
blank | Space symbols | | {} | |
asciiword | Word,all ASCII | services | {english_stem} | english_stem | {servic}
select * from ts_debug('simple','gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+--------------+------------+------------
asciiword | Word,all ASCII | gutter | {simple} | simple | {gutter}
blank | Space symbols | | {} | |
asciiword | Word,all ASCII | cleaning | {simple} | simple | {cleaning}
blank | Space symbols | | {} | |
asciiword | Word,all ASCII | services | {simple} | simple | {services}
select name from categories where (to_tsvector('english_simple_conf',name) @@ (to_tsquery('english_simple_conf','cleani:*')));
name
------
(0 rows)
但是在英语词典中搜索偏部会返回预期的结果。
select name from categories where (to_tsvector('english_simple_conf','clea:*')));
name
--------------------------
Gutter Cleaning Services
解决方法
但这会检查单词是否同时在两个词典中。
那是不正确的。 As noted in the docs(请参阅dictionary_name
参数的说明),它会按顺序检查它们;它仅检查第二词典是否没有从第一词典获得令牌。您可以使用ts_debug()
进行验证。
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword,asciihword,hword_asciipart,word,hword,hword_part
WITH simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf','cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+--------------+------------+------------
asciiword | Word,all ASCII | cars | {simple} | simple | {cars}
blank | Space symbols | | {} | |
asciiword | Word,all ASCII | boats | {simple} | simple | {boats}
blank | Space symbols | | {} | |
numword | Word,letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword,hword_part
WITH english_stem,simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf','cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+-----------------------+--------------+------------
asciiword | Word,all ASCII | cars | {english_stem,simple} | english_stem | {car}
blank | Space symbols | | {} | |
asciiword | Word,all ASCII | boats | {english_stem,simple} | english_stem | {boat}
blank | Space symbols | | {} | |
numword | Word,letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
最后两个查询中出现差异的原因是english_stem将'Cleaning'改为'clean',因此搜索'cleani *'将不匹配。尝试将to_tsvector和to_tsquery表达式添加为列,并将其从WHERE中删除;您会看到“装订线清洁服务”源自'clean':2 'gutter':1 'servic':3
。
testdb=# select to_tsvector('english_simple_conf',name),to_tsquery('english_simple_conf','cleani:*'),name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'cleani':* | Gutter Cleaning Services
(1 row)
testdb=# select to_tsvector('english_simple_conf','cleaning:*'),name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'clean':* | Gutter Cleaning Services
(1 row)
如果您将ts_query
更改为搜索cleaning:*
,则该词干也会被阻止,并再次匹配。但是,english_stem无法弄清楚“ cleani”是指“ clean”,除非它也看到了“ ng”。因此,这很简单,不执行任何操作,最终导致不匹配-在tsquery中仍然是尾随i
,但在tsvector中却没有。
词干并不是要对单词的任意前缀起作用,而只能对整个前缀起作用;对于前缀匹配,您将使用传统的左锚定LIKE。