使用两个字典创建全文搜索配置

问题描述

我想使用english_stem词典和简单词典对postgresql列执行全文搜索。我可以做这样的事情:

ALTER TEXT SEARCH CONfigURATION english_simple_conf 
ALTER MAPPING FOR asciiword,asciihword,hword_asciipart,word,hword,hword_part
WITH english_stem,simple;

但这会检查单词是否同时在词典中。有没有办法更改此配置,以便将该单词与一个词典或另一个词典匹配?

编辑:

我认为未按顺序检查它们的原因是,当搜索应在简单词典中找到的部分单词时,不会返回任何内容

select * from ts_debug('english','gutter cleaning services');

   alias   |   description   |  token   |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+----------+----------------+--------------+----------
 asciiword | Word,all ASCII | gutter   | {english_stem} | english_stem | {gutter}
 blank     | Space symbols   |          | {}             |              |
 asciiword | Word,all ASCII | cleaning | {english_stem} | english_stem | {clean}
 blank     | Space symbols   |          | {}             |              |
 asciiword | Word,all ASCII | services | {english_stem} | english_stem | {servic}
select * from ts_debug('simple','gutter cleaning services');

   alias   |   description   |  token   | dictionaries | dictionary |  lexemes
-----------+-----------------+----------+--------------+------------+------------
 asciiword | Word,all ASCII | gutter   | {simple}     | simple     | {gutter}
 blank     | Space symbols   |          | {}           |            |
 asciiword | Word,all ASCII | cleaning | {simple}     | simple     | {cleaning}
 blank     | Space symbols   |          | {}           |            |
 asciiword | Word,all ASCII | services | {simple}     | simple     | {services}
select name from categories where (to_tsvector('english_simple_conf',name) @@ (to_tsquery('english_simple_conf','cleani:*')));
 name
------
(0 rows)

但是在英语词典中搜索偏部会返回预期的结果。

select name from categories where (to_tsvector('english_simple_conf','clea:*')));

           name
--------------------------
 Gutter Cleaning Services

解决方法

但这会检查单词是否同时在两个词典中。

那是不正确的。 As noted in the docs(请参阅dictionary_name参数的说明),它会按顺序检查它们;它仅检查第二词典是否没有从第一词典获得令牌。您可以使用ts_debug()进行验证。

testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf 
ALTER MAPPING FOR asciiword,asciihword,hword_asciipart,word,hword,hword_part
WITH simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf','cars boats n0taword');
   alias   |       description        |  token   | dictionaries | dictionary |  lexemes   
-----------+--------------------------+----------+--------------+------------+------------
 asciiword | Word,all ASCII          | cars     | {simple}     | simple     | {cars}
 blank     | Space symbols            |          | {}           |            | 
 asciiword | Word,all ASCII          | boats    | {simple}     | simple     | {boats}
 blank     | Space symbols            |          | {}           |            | 
 numword   | Word,letters and digits | n0taword | {simple}     | simple     | {n0taword}
(5 rows)

testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf 
ALTER MAPPING FOR asciiword,hword_part
WITH english_stem,simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf','cars boats n0taword');
   alias   |       description        |  token   |     dictionaries      |  dictionary  |  lexemes   
-----------+--------------------------+----------+-----------------------+--------------+------------
 asciiword | Word,all ASCII          | cars     | {english_stem,simple} | english_stem | {car}
 blank     | Space symbols            |          | {}                    |              | 
 asciiword | Word,all ASCII          | boats    | {english_stem,simple} | english_stem | {boat}
 blank     | Space symbols            |          | {}                    |              | 
 numword   | Word,letters and digits | n0taword | {simple}              | simple       | {n0taword}
(5 rows)

最后两个查询中出现差异的原因是english_stem将'Cleaning'改为'clean',因此搜索'cleani *'将不匹配。尝试将to_tsvector和to_tsquery表达式添加为列,并将其从WHERE中删除;您会看到“装订线清洁服务”源自'clean':2 'gutter':1 'servic':3

testdb=# select to_tsvector('english_simple_conf',name),to_tsquery('english_simple_conf','cleani:*'),name from categories;
           to_tsvector           | to_tsquery |           name           
---------------------------------+------------+--------------------------
 'clean':2 'gutter':1 'servic':3 | 'cleani':* | Gutter Cleaning Services
(1 row)

testdb=# select to_tsvector('english_simple_conf','cleaning:*'),name from categories;
           to_tsvector           | to_tsquery |           name           
---------------------------------+------------+--------------------------
 'clean':2 'gutter':1 'servic':3 | 'clean':*  | Gutter Cleaning Services
(1 row)

如果您将ts_query更改为搜索cleaning:*,则该词干也会被阻止,并再次匹配。但是,english_stem无法弄清楚“ cleani”是指“ clean”,除非它也看到了“ ng”。因此,这很简单,不执行任何操作,最终导致不匹配-在tsquery中仍然是尾随i,但在tsvector中却没有。

词干并不是要对单词的任意前缀起作用,而只能对整个前缀起作用;对于前缀匹配,您将使用传统的左锚定LIKE。