问题描述
我有一本非常大的词典,其中存储了大量的英语句子及其西班牙语翻译。我的原始代码如下:
from fuzzywuzzy import process
sentencePairs = {'How are you?':'¿Cómo estás?','Good morning!':'¡Buenos días!'}
query= 'How old are you?'
match = process.extractOne(query,sentencePairs.keys())[0]
print(match,sentencePairs[match],sep='\n')
然后,我使用RapidFuzz而不是fuzzywuzzy来达到更快的速度。我也尝试了多线程,但是令人惊讶的是它并没有太大帮助。我的新代码如下:
from rapidfuzz import process,utils,fuzz
from concurrent.futures import ThreadPoolExecutor
import time,string,random
random.seed(18)
def findMatch(query,dictionary):
match,score = process.extractOne(
utils.default_process(query),dictionary.keys(),processor=None,scorer=fuzz.ratio)
return (match,score)
# make a dictionary for testing
d = {
''.join(random.choice(string.ascii_lowercase + string.digits)
for _ in range(15)
): "spanish text"
for s in range(1000000)
}
d['how are you?'] = '¿Cómo estás?'
# split the dictionary in half for multithreading
d1 = dict(list(d.items())[:len(d)//2])
d2 = dict(list(d.items())[len(d)//2:])
query= 'How old are you?'
# ---with multithreading---
start_time1 = time.time()
print('Start matching with multithreading...')
with ThreadPoolExecutor() as executor:
future = executor.submit(findMatch,query,d1)
match1,score1 = future.result()
with ThreadPoolExecutor() as executor:
future = executor.submit(findMatch,d2)
match2,score2 = future.result()
if score1 >= score2 and score1 > 70:
print(match1,d[match1],sep=' - ')
elif score2 > score1 and score2 > 70:
print(match2,d[match2],sep=' - ')
else:
print('No match found.')
print('Time spent with multithreading: {}\n'.format(time.time() - start_time1))
# ---without multithreading---
start_time2 = time.time()
print('Start matching without multithreading...')
match,score = findMatch(query,d)
if score > 70:
print(match,d[match],sep=' - ')
print('Time spent without multithreading: {}'.format(time.time() - start_time2))
我认为多线程将大大减少匹配时间,但实际上却相反。有没有一种方法可以大大减少匹配时间?还是我使用错误的多线程方法?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)