如何在列表理解中并行化功能并保持顺序

问题描述

我有一个2d数组（不同长度）的列表，我需要通过列表理解将某些函数有效地应用于这些数组。

由于这还不够快，因此列表理解需要并行化。

要做到这一点，又要保持切片（或“子数组”）的顺序的正确方法是什么？

def get_slice_max(arr): 
     '''
     get the slice,but replace every element with the maximum value that has occoured till(including) the iter so far.
     ''' 
     result = [arr[0]] 
     for i in range(1,len(arr)):  
         result.append(max(result[-1],arr[i])) 
     return result

result  = [get_slice_max(slice_)  for slice_ in a]

可重现的样品：

a = [ np.array(range(1,random.randint(3,8))) for x in range(10000)]

编辑：我需要像这样的列表理解并行处理：

temp = np.random.randint(1,high=100,size=10) # determines the sizes of the subarrays
A,B,C =  [ np.randint(0,high=1,size=x) for x in temp],[ np.random.uniform(size=x) for x in temp],[ np.random.uniform(size=x) for x in temp]
result = [ [y if x==1 else z for x,y,z in zip(a,b,c)] 
              for  a,c  in zip(A,C,) ]

temp = np.random.randint(1,size=10) # determines the sizes of the subarrays
D,E = [ np.random.uniform(size=x) for x in temp],[ np.randint(0,size=x) for x in temp]
[ [ x/y for x,y in zip(d,np.maximum.accumulate(get_slice_max(e))] for d,e in zip(D,E) ]

解决方法

使用numpy.maximum.accumulate：

# Sample
a = [np.random.randint(1,10,np.random.randint(3,8)) for _ in range(10000)]
a[:3]
# [array([4,5,6]),array([7,2,8,9,5]),array([5,1,7,5])]

[np.maximum.accumulate(arr) for arr in a]

输出：

[array([4,9]),7])]

验证：

all(np.array_equal(get_slice_max(arr),np.maximum.accumulate(arr)) for arr in a)
# True

基准测试（快6倍）：

%timeit [np.maximum.accumulate(arr) for arr in a]
# 6.07 ms ± 498 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
%timeit [get_slice_max(arr) for arr in a]
# 32.4 ms ± 11 ms per loop (mean ± std. dev. of 7 runs,10 loops each)

list-comprehension multiprocess parallel-processing performance python

如何在列表理解中并行化功能并保持顺序

问题描述

解决方法

相关问答