问题描述
我有一个2d数组(不同长度)的列表,我需要通过列表理解将某些函数有效地应用于这些数组。
由于这还不够快,因此列表理解需要并行化。
要做到这一点,又要保持切片(或“子数组”)的顺序的正确方法是什么?
def get_slice_max(arr):
'''
get the slice,but replace every element with the maximum value that has occoured till(including) the iter so far.
'''
result = [arr[0]]
for i in range(1,len(arr)):
result.append(max(result[-1],arr[i]))
return result
result = [get_slice_max(slice_) for slice_ in a]
可重现的样品:
a = [ np.array(range(1,random.randint(3,8))) for x in range(10000)]
编辑: 我需要像这样的列表理解并行处理:
temp = np.random.randint(1,high=100,size=10) # determines the sizes of the subarrays
A,B,C = [ np.randint(0,high=1,size=x) for x in temp],[ np.random.uniform(size=x) for x in temp],[ np.random.uniform(size=x) for x in temp]
result = [ [y if x==1 else z for x,y,z in zip(a,b,c)]
for a,c in zip(A,C,) ]
temp = np.random.randint(1,size=10) # determines the sizes of the subarrays
D,E = [ np.random.uniform(size=x) for x in temp],[ np.randint(0,size=x) for x in temp]
[ [ x/y for x,y in zip(d,np.maximum.accumulate(get_slice_max(e))] for d,e in zip(D,E) ]
解决方法
使用numpy.maximum.accumulate
:
# Sample
a = [np.random.randint(1,10,np.random.randint(3,8)) for _ in range(10000)]
a[:3]
# [array([4,5,6]),array([7,2,8,9,5]),array([5,1,7,5])]
[np.maximum.accumulate(arr) for arr in a]
输出:
[array([4,9]),7])]
验证:
all(np.array_equal(get_slice_max(arr),np.maximum.accumulate(arr)) for arr in a)
# True
基准测试(快6倍):
%timeit [np.maximum.accumulate(arr) for arr in a]
# 6.07 ms ± 498 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
%timeit [get_slice_max(arr) for arr in a]
# 32.4 ms ± 11 ms per loop (mean ± std. dev. of 7 runs,10 loops each)