字典中的 n-gram 模糊匹配

问题描述

给定一个可变长度的字符串 S 和一个 n-grams N 的字典 D，我想：

提取 S 中与模糊匹配逻辑匹配的所有 N（以捕获拼写错误）
提取 S 中的所有数字
按照与 S 中相同的顺序显示结果

我完成了第 1 点和第 2 点，但是我的方法基于从 S 创建 n-gram 并根据字典进行模糊匹配（加上数字匹配）并没有保持项目在 S 中的顺序>

from nltk import everygrams
from flask_caching import Cache
import re

string = "Hello everybody,today we have 2.000 cell phones here"
ngrams = (list(everygrams(string.split(),1,4)))

my_dict = {
    "brand": "ITEM_01","model": "ITEM_02","cell phone": "ITEM_04","today" : "ITEM_05"
}

result=""
results=[] # list with final results
d = FuzzyDict(my_dict) # create the dictionary for fuzzy matching

for k in ngrams:
    candidate = ' '.join(k)
    print (f"Searching for {candidate}")
    
    try:
        #matching n-gram in Dictionary using fuzzy match
        result = d[candidate]
        print (f"Found {result}")
        results.append(result)

    except:
        print("An exception occurred") 
    
    #matching complex numbers
    numbers = re.findall(r'(?:[+-]|\()?\$?\d+(?:,\d+)*(?:\.\d+)?\)?',candidate)
    
    #appending numbers to list
    results.extend(numbers)
        
#NOTE chronological order is not kept!

#keeping unque values since my approach will extract several instances of the same item
myset = set(results)
results_unique = list(myset)

这应该给我“ITEM_5 2.000 ITEM_4”（现在订单是随意的）

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

fuzzy-search fuzzywuzzy n-gram python regex regex regex

字典中的 n-gram 模糊匹配

问题描述

解决方法

相关问答