问题描述
在具有数百 GB RAM 的服务器上使用带有 6.5GB 数据集的 python(通过 psutil
确认)。尝试将文件加载到 Pandas 时出现内存错误。这是 psutil
的输出:
import psutil
psutil.virtual_memory()
svmem(total=405042839552,available=254328373248,percent=37.2,used=148782104576,free=148047446016,active=79192813568,inactive=96666456064,buffers=20480,cached=108213268480,shared=767070208,slab=4305301504)
psutil
显示 254.3GB 可用 RAM,但是当我尝试加载 6.5GB 文件时,我得到以下回溯:
#filename is 6.5GB
df = pd.read_table(filename,sep='\t')
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-8-0b957ec637b5> in <module>
----> 2 df = pd.read_table(filename,sep='\t')
/hpc/packages/minerva-centos7/py_packages/3.7/lib/python3.7/site-packages/pandas/io/parsers.py in read_table(filepath_or_buffer,sep,delimiter,header,names,index_col,usecols,squeeze,prefix,mangle_dupe_cols,dtype,engine,converters,true_values,false_values,skipinitialspace,skiprows,skipfooter,nrows,na_values,keep_default_na,na_filter,verbose,skip_blank_lines,parse_dates,infer_datetime_format,keep_date_col,date_parser,dayfirst,cache_dates,iterator,chunksize,compression,thousands,decimal,lineterminator,quotechar,quoting,doublequote,escapechar,comment,encoding,dialect,error_bad_lines,warn_bad_lines,delim_whitespace,low_memory,memory_map,float_precision)
765 # default to avoid a ValueError
766 sep = ","
--> 767 return read_csv(**locals())
768
769
/hpc/packages/minerva-centos7/py_packages/3.7/lib/python3.7/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer,float_precision)
686 )
687
--> 688 return _read(filepath_or_buffer,kwds)
689
690
/hpc/packages/minerva-centos7/py_packages/3.7/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer,kwds)
458
459 try:
--> 460 data = parser.read(nrows)
461 finally:
462 parser.close()
/hpc/packages/minerva-centos7/py_packages/3.7/lib/python3.7/site-packages/pandas/io/parsers.py in read(self,nrows)
1196 def read(self,nrows=None):
1197 nrows = _validate_integer("nrows",nrows)
-> 1198 ret = self._engine.read(nrows)
1199
1200 # May alter columns / col_dict
/hpc/packages/minerva-centos7/py_packages/3.7/lib/python3.7/site-packages/pandas/io/parsers.py in read(self,nrows)
2155 def read(self,nrows=None):
2156 try:
-> 2157 data = self._reader.read(nrows)
2158 except stopiteration:
2159 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers._concatenate_chunks()
<__array_function__ internals> in concatenate(*args,**kwargs)
MemoryError: Unable to allocate 15.3 MiB for an array with shape (2003397,) and data type float64
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)