优化 zarr 数组处理

问题描述

我有一个包含 80 个 5-D zarr 文件的列表(mylist),其结构如下(T、F、B、Az、El)。数组的形状为 [24x4096x2016x24x8]。

我想提取切片数据并使用以下函数沿某个轴运行概率

def GetPolarData(mylist,freq,FreqLo,FreqHi):
    '''
    This function will take the list of zarr files (T,F,B,Az,El),open them,used selected frequency to return an array
    of files with Azimuth and Elevation probabilities
    '''

    ChanIndx = FreqCut(FreqLo,FreqHi,freq)
    
    if len(ChanIndx) != 0:
        MyData = []
        for i in range(len(mylist)):
            print('Adding file {} : {}'.format(i,mylist[i][32:]))
            try:
                zarrf = xr.open_zarr(mylist[i],group = 'arr')
                m = zarrf.master.sum(dim = ['time','baseline'])
                m = m[ChanIndx].sum(dim = ['frequency'])

                c = zarrf.counter.sum(dim = ['time','baseline'])
                c = c[ChanIndx].sum(dim = ['frequency'])

                p = m.astype(float)/c.astype(float)

                MyData.append(p)

            except Exception as e:
                print(e)
                continue

    else:
        print("Something went wrong in Frequency selection")
                
    print("##########################################")
    print("This will be contribution to selected band")
    print("##########################################")

    print(f"Min {np.nanmin(MyData)*100:.3f}%  ")
    print(f"Max {np.nanmax(MyData)*100:.3f}%  ")
    print(f"Average {np.nanmean(MyData)*100:.3f}%  ")
    return(MyData) 

如果我使用以下方法调用函数

FreqLo = 470.
FreqHi = 854.
MyTVData =np.array(GetPolarData(AllZarrList,Freq,FreqHi))

我发现在 40 核、256 GB RAM 上运行需要很长时间(超过 3 小时)

有没有办法让它运行得更快?

谢谢

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)