如何在不占用RAM内存过多的情况下用大npy文件填充conv2d网络？

问题描述

我有一个.npy格式的大数据集，大小为（500000,18）。为了使用生成器将其馈入conv2D网络中，我分别插入了X和y并分别以（-1，96，10，10，17）和（-1，1）格式对其进行了整形。但是，当我将其放入模型中时，我得到了内存错误：

2020-08-26 14:37:03.691425: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 462080 totalling 451.2KiB
2020-08-26 14:37:03.691432: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 515840 totalling 503.8KiB
2020-08-26 14:37:03.691438: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 22.89GiB
2020-08-26 14:37:03.691445: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 24576286720 memory_limit_: 68719476736 available bytes: 44143190016 curr_region_allocation_bytes_: 34359738368
2020-08-26 14:37:03.691455: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit:                 68719476736
InUse:                 24576278528
MaxInUse:              24576278784
NumAllocs:                  140334
MaxAllocSize:            268435456

我正在使用32 Gb的GPU。

我尝试了不同的策略，但没有成功。首先，numpy.memmap：

def Meu_Generador_4(path,batch_size,tempo,janela):
  total  = sum(1 for line in np.load(path))
  number_of_batches = total/batch_size
  data = np.memmap(path,dtype='float64',mode='r',shape=(total,18))
  
  ""create a memmap array to store the output""
  y_output = np.memmap('output',18),mode='r+')
  counter=0

  while 1:
    y_output[counter:batch_size+counter] = data[counter:batch_size+counter]
    X,y = input_3D(y_output[counter:batch_size+counter],janela) 
    y = y.reshape(-1,1)
    counter += 1
    yield X.reshape(-1,96,10,17),y
    print('AQUI')
        #restart counter to yeild data in the next epoch as well
    if counter >= number_of_batches:
        counter = 0

或者dask延迟数组：

def Meu_Generador_3(path,janela):
  samples_per_epoch  = sum(1 for line in np.load(path))
  number_of_batches = np.floor(samples_per_epoch/batch_size)
  data = da.from_array(np.load(path,mmap_mode='r'),chunks = (number_of_batches,18))
  data = data.to_delayed()
  counter=0

  while 1:
    chunk = da.from_delayed(data[counter][0],shape=(number_of_batches,dtype=data.dtype)
    X,y = input_3D(chunk.compute(),janela) 
    counter += 1
    yield X.reshape(-1,40,y
    print("AQUI")
    #restart counter to yeild data in the next epoch as well
    if (counter+number_of_batches) >= number_of_batches:
        counter = 0

我知道我可以将文件拆分为许多较小的文件，但是我不想这样做。谢谢

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

conv-neural-network dask-delayed memory-management numpy-memmap python