问题描述
我想加载5.9 GB CSV,并且我不使用pandas库。我有4个GPU。我使用rapids.ai来更快地加载此大型数据集,但是每次尝试时,都会显示此错误,尽管我的其他GPU内存中有空间。一开始,GPU的内存使用情况是:
GPU 0
total : 11554717696
free : 11126046720
used : 428670976
GPU 1
total : 11554717696
free : 11542331392
used : 12386304
GPU 2
total : 11554717696
free : 11542331392
used : 12386304
GPU 3
total : 11551440896
free : 11113070592
used : 438370304
,代码为:
import cudf
import pandas as pd
import time
import subprocess as sp
import os
import dask_cudf
name = 'T100'
path = '/media/mo/2438a3d1-29fe-4c6f-aafb-f906acd5140d/aimD/c1/trajs/'+name+'.CSV'
start = time.time()
data = dask_cudf.from_cudf(cudf.read_csv(path),npartitions=4).compute()
done = time.time()
elapsed = done - start
print(elapsed)
提示:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-3-1fff5fb4e9b4> in <module>
2
3
----> 4 data = dask_cudf.from_cudf(cudf.read_csv(path),5 npartitions=4).compute()
6 done = time.time()
~/anaconda3/envs/machineLearning/lib/python3.7/contextlib.py in inner(*args,**kwds)
72 def inner(*args,**kwds):
73 with self._recreate_cm():
---> 74 return func(*args,**kwds)
75 return inner
76
~/anaconda3/envs/machineLearning/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer,lineterminator,quotechar,quoting,doublequote,header,mangle_dupe_cols,usecols,sep,delimiter,delim_whitespace,skipinitialspace,names,dtype,skipfooter,skiprows,dayfirst,compression,thousands,decimal,true_values,false_values,nrows,byte_range,skip_blank_lines,parse_dates,comment,na_values,keep_default_na,na_filter,prefix,index_col,**kwargs)
82 na_filter=na_filter,83 prefix=prefix,---> 84 index_col=index_col,85 )
86
cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()
MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorMemoryAllocation out of memory
解决方法
问题的答案:CUDF error processing a large number of parquet files
说明如何使用dask_cudf读取大文件:https://stackoverflow.com/a/58123478/13887495
按照答案中的说明进行操作可以帮助您解决MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorMemoryAllocation out of memory
代码应为
data = dask_cudf.read_csv(path,npartitions=4)