如何在使用 Pandas 的 read.csv 期间避免“未命名”列? 如何在读取 Unnamed 文件期间删除 csv 列?

问题描述

我有 20-30 个 csv 文件要读取。

所以,我试试下面的代码

pat_dir = ['file*.csv']
files_grabbed = []
for files in pat_dir:
    files_grabbed.extend(glob.glob(files))
for f in files_grabbed:
    df = pd.read_csv(f,sep=",",low_memory=False)
    print(f)
    print(df.columns)

打印它们会得到如下输出

file1.csv
Index(['Date','Code','Test','value','unit','TextualResults','subject_id','class_id','Unnamed: 8','Unnamed: 9','Unnamed: 10','Unnamed: 11','Unnamed: 12','Unnamed: 13','Unnamed: 14','Unnamed: 15','Unnamed: 16','Unnamed: 17','Unnamed: 18','Unnamed: 19','Unnamed: 20','Unnamed: 21','Unnamed: 22','Unnamed: 23','Unnamed: 24','Unnamed: 25','Unnamed: 26','Unnamed: 27','Unnamed: 28','Unnamed: 29','Unnamed: 30','Unnamed: 31','Unnamed: 32','Unnamed: 33','Unnamed: 34','Unnamed: 35','Unnamed: 36','Unnamed: 37','Unnamed: 38','Unnamed: 39','Unnamed: 40','Unnamed: 41','Unnamed: 42','Unnamed: 43','Unnamed: 44','Unnamed: 45','Unnamed: 46','Unnamed: 47','Unnamed: 48','Unnamed: 49','Unnamed: 50'],

虽然我可以使用下面的代码在 read.csv 之后避免 unnamed

df = df.loc[:,~df.columns.str.contains('^Unnamed')]

如何避免在 unnamed 操作期间读取那些 read.csv 列?

请注意,我事先不知道列名。因此,我无法将 column names 定义为 read.csv。因为每个文件可以有不同的列名

那么,有没有办法在 read.csv 操作期间删除它们,因为我有 30 个文件,这会导致 glob 操作期间出现问题?

解决方法

你可以试试:

pat_dir = ['file*.csv']
files_grabbed = []
for files in pat_dir:
    files_grabbed.extend(glob.glob(files))
for f in files_grabbed:
    df = pd.read_csv(f,sep=",",low_memory=False)
    df=df.drop(df.filter(like='Unnamed').columns,axis=1)
    print(f)
    print(df.columns)

通过pipe()

pat_dir = ['file*.csv']
files_grabbed = []
for files in pat_dir:
    files_grabbed.extend(glob.glob(files))
for f in files_grabbed:
    df = pd.read_csv(f,low_memory=False).pipe(lambda x:x.drop(x.filter(like='Unnamed'),1))
    print(f)
    print(df.columns)
,

如何在读取 Unnamed 文件期间删除 csv 列?

Pandas read_csv 方法接受一个名为 usecols 的可选关键字参数,用于从 csv 文件中选择列的子集。这个参数的有趣之处在于它可以接受一个可调用函数,然后根据列名评估这个可调用函数,并只返回可调用函数计算的列名到True

以下是如何在示例中传递可调用函数以防止首先读取 Unnamed 列。

for file in files_grabbed:
    df = pd.read_csv(file,low_memory=False,usecols=lambda c: not c.startswith('Unnamed:'))