python – 关于坏数据的Pandas dataframe read

我想读一个非常大的csv(不能在excel中打开并且很容易编辑)但是在第100,000行的某个地方,有一行有一个额外的列导致程序崩溃.这行是错误的,所以我需要一种方法来忽略它是一个额外的列的事实.有大约50列,所以硬编码标题和使用名称或usecols是不可取的.我也可能在其他csv中遇到这个问题,并且想要一个通用的解决方案.遗憾的是,我在read_csv中找不到任何内容.代码就像这样简单：

def loadCSV(filePath):
    dataframe = pd.read_csv(filePath, index_col=False, encoding='iso-8859-1', nrows=1000)
    datakeys = dataframe.keys();
    return dataframe, datakeys

解决方法:

传递error_bad_lines=False以跳过错误的行：

error_bad_lines : boolean, default True Lines with too many fields
(e.g. a csv line with too many commas) will by default cause an
exception to be raised, and no DataFrame will be returned. If False,
then these “bad lines” will dropped from the DataFrame that is
returned. (Only valid with C parser)

python – 关于坏数据的Pandas dataframe read_csv

相关文章