问题描述
我正在使用cdist
中的SciPy
来计算一维数组上的成对元素,我这样使用它:
import numpy as np
import textdistance
from scipy.spatial.distance import cdist
from time import time
first_ = np.array(["hello gtjj rgreg","hellllo zefze ergee"])
second_ = np.array(["hlo asad gerg","alle gtrhh gerg"])
first_ = np.tile(first_,100)
second_ = np.tile(second_,100)
start_time = time()
mat_to_compare = cdist(second_[:,np.newaxis],first_[:,lambda a,b: textdistance.cosine(a[0],b[0]))
mat_to_compare = cdist(second_[:,b: textdistance.hamming.normalized_distance(a[0],b: textdistance.prefix.normalized_distance(a[0],b: textdistance.postfix.normalized_distance(a[0],b: textdistance.jaro_winkler(a[0],b[0]))
execution_time = time() - start_time
print(execution_time)
然后,我想更快地计算距离矩阵,所以我研究了cdist
源代码,并使用构建矩阵的循环尝试了此操作:
start_time = time()
XA,XB = second_[:,np.newaxis]
s,sB = XA.shape,XB.shape
mA = s[0]
mB = sB[0]
dm1 = np.empty((mA,mB),dtype=np.double)
dm2 = np.empty((mA,dtype=np.double)
dm3 = np.empty((mA,dtype=np.double)
dm4 = np.empty((mA,dtype=np.double)
dm5 = np.empty((mA,dtype=np.double)
for i in range(0,mA):
for j in range(0,mB):
dm1[i,j] = textdistance.cosine(XA[i][0],XB[j][0])
dm2[i,j] = textdistance.hamming.normalized_distance(XA[i][0],XB[j][0])
dm3[i,j] = textdistance.prefix.normalized_distance(XA[i][0],XB[j][0])
dm4[i,j] = textdistance.postfix.normalized_distance(XA[i][0],XB[j][0])
dm5[i,j] = textdistance.jaro_winkler(XA[i][0],XB[j][0])
execution_time = time() - start_time
print(execution_time)
但是,在我尝试了两种解决方案之后,执行时间几乎不尽相同。任何人都可以看到一种增强我所有矩阵的计算的方法吗?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)