问题描述
[('the',23135851162),('of',13151942776),('and',12997637966),('to',12136980858),('a',9081174698)]@H_502_1@
from itertools import islice
import pkg_resources
from symspellpy import SymSpell
sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
"symspellpy","frequency_dictionary_en_82_765.txt")
sym_spell.load_dictionary(dictionary_path,1)
# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.words.items(),5)))
from itertools import islice
import pkg_resources
from symspellpy import SymSpell
sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
"symspellpy","frequency_dictionary_en_82_765.txt")
sym_spell.load_bigram_dictionary(dictionary_path,2)
# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.bigrams.items(),5)))
[('abcs of',10956800),('aaron and',10721728),('abbott and',7861376),('abbreviations and',13518272),('aberdeen and',7347776)]@H_502_1@
根据本页:@H_502_1@
https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html@H_502_1@
解决方法
链接页面上以及您的问题中给出的第二个示例引用了错误的数据文件。您必须参考包含的 bigram 数据文件。
解释示例的文档显示了每个示例的预期数据格式,并且格式不同。然而,这两个示例引用了同一个数据文件。这一定是有一处或另一处是错误的,错误在于第二个示例应该引用二元组数据文件。
以下是正确运行的完整代码:
from itertools import islice
import pkg_resources
from symspellpy import SymSpell
sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
"symspellpy","frequency_bigramdictionary_en_243_342.txt") # << - fixed to refer to the bigram data file
sym_spell.load_bigram_dictionary(dictionary_path,2)
# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.bigrams.items(),5)))
结果:
[('abcs of',10956800),('aaron and',10721728),('abbott and',7861376),('abbreviations and',13518272),('aberdeen and',7347776)]