问题描述
给定一个可变长度的字符串 S 和一个 n-grams N 的字典 D,我想:
我完成了第 1 点和第 2 点,但是我的方法基于从 S 创建 n-gram 并根据字典进行模糊匹配(加上数字匹配)并没有保持项目在 S 中的顺序>
from nltk import everygrams
from flask_caching import Cache
import re
string = "Hello everybody,today we have 2.000 cell phones here"
ngrams = (list(everygrams(string.split(),1,4)))
my_dict = {
"brand": "ITEM_01","model": "ITEM_02","cell phone": "ITEM_04","today" : "ITEM_05"
}
result=""
results=[] # list with final results
d = FuzzyDict(my_dict) # create the dictionary for fuzzy matching
for k in ngrams:
candidate = ' '.join(k)
print (f"Searching for {candidate}")
try:
#matching n-gram in Dictionary using fuzzy match
result = d[candidate]
print (f"Found {result}")
results.append(result)
except:
print("An exception occurred")
#matching complex numbers
numbers = re.findall(r'(?:[+-]|\()?\$?\d+(?:,\d+)*(?:\.\d+)?\)?',candidate)
#appending numbers to list
results.extend(numbers)
#NOTE chronological order is not kept!
#keeping unque values since my approach will extract several instances of the same item
myset = set(results)
results_unique = list(myset)
这应该给我“ITEM_5 2.000 ITEM_4”(现在订单是随意的)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)