Numpy shared_memory数组在池中重置为零

问题描述

我正在尝试与进程Pool共享一个大型3维numpy数组,以便对所述大型数组的切片执行某些操作。 在我的main中:

_dtype = np.dtype('float64')
n_rotations,n_coords,n_points = 7000,3,25600
shm = shared_memory.SharedMemory(
    create=True,size=n_rotations * n_coords * n_points * _dtype.itemsize)
rotations_name = shm.name
coordinates = np.ndarray(
    (n_rotations,n_points),dtype=_dtype,buffer=shm.buf)
coordinates = rotations @ ellipsoid
print(coordinates.shape)  # outputs (n_rotations,n_points)

chunks = [(rot_idx,rotations_name,args.output,(n_rotations,max_rad)
            for rot_idx in range(n_rotations)]
pool = Pool(args.processes)
_res = pool.starmap_async(gen_features,chunks).get()

gen_features的定义如下:

def gen_features(idx: int,buf_name: str,_dir: str,rot_dims: tuple,max_rad: int):
    shm = shared_memory.SharedMemory(name=buf_name)
    rotations = np.ndarray(rot_dims,dtype=np.dtype('float64'),buffer=shm.buf)
    print(rotations)  # here the np array has become zero-filled for some reason
    del rotations,_
    shm.close()
    return idx

解决方法

花费了将近一个小时的调试之后,事实证明,您必须“复制”数据,如this部分所述:

b[:] = a[:]  # Copy the original data into shared memory

基本上,这

coordinates[:] = rotations @ ellipsoid

代替

coordinates = rotations @ ellipsoid