问题描述
所以,我试试下面的代码
pat_dir = ['file*.csv']
files_grabbed = []
for files in pat_dir:
files_grabbed.extend(glob.glob(files))
for f in files_grabbed:
df = pd.read_csv(f,sep=",",low_memory=False)
print(f)
print(df.columns)
打印它们会得到如下输出
file1.csv
Index(['Date','Code','Test','value','unit','TextualResults','subject_id','class_id','Unnamed: 8','Unnamed: 9','Unnamed: 10','Unnamed: 11','Unnamed: 12','Unnamed: 13','Unnamed: 14','Unnamed: 15','Unnamed: 16','Unnamed: 17','Unnamed: 18','Unnamed: 19','Unnamed: 20','Unnamed: 21','Unnamed: 22','Unnamed: 23','Unnamed: 24','Unnamed: 25','Unnamed: 26','Unnamed: 27','Unnamed: 28','Unnamed: 29','Unnamed: 30','Unnamed: 31','Unnamed: 32','Unnamed: 33','Unnamed: 34','Unnamed: 35','Unnamed: 36','Unnamed: 37','Unnamed: 38','Unnamed: 39','Unnamed: 40','Unnamed: 41','Unnamed: 42','Unnamed: 43','Unnamed: 44','Unnamed: 45','Unnamed: 46','Unnamed: 47','Unnamed: 48','Unnamed: 49','Unnamed: 50'],
虽然我可以使用下面的代码在 read.csv 之后避免 unnamed
列
df = df.loc[:,~df.columns.str.contains('^Unnamed')]
如何避免在 unnamed
操作期间读取那些 read.csv
列?
请注意,我事先不知道列名。因此,我无法将 column names
定义为 read.csv。因为每个文件可以有不同的列名
那么,有没有办法在 read.csv 操作期间删除它们,因为我有 30 个文件,这会导致 glob
操作期间出现问题?
解决方法
你可以试试:
pat_dir = ['file*.csv']
files_grabbed = []
for files in pat_dir:
files_grabbed.extend(glob.glob(files))
for f in files_grabbed:
df = pd.read_csv(f,sep=",",low_memory=False)
df=df.drop(df.filter(like='Unnamed').columns,axis=1)
print(f)
print(df.columns)
或
通过pipe()
:
pat_dir = ['file*.csv']
files_grabbed = []
for files in pat_dir:
files_grabbed.extend(glob.glob(files))
for f in files_grabbed:
df = pd.read_csv(f,low_memory=False).pipe(lambda x:x.drop(x.filter(like='Unnamed'),1))
print(f)
print(df.columns)
,
如何在读取 Unnamed
文件期间删除 csv
列?
Pandas read_csv
方法接受一个名为 usecols
的可选关键字参数,用于从 csv 文件中选择列的子集。这个参数的有趣之处在于它可以接受一个可调用函数,然后根据列名评估这个可调用函数,并只返回可调用函数计算的列名到True
。
以下是如何在示例中传递可调用函数以防止首先读取 Unnamed
列。
for file in files_grabbed:
df = pd.read_csv(file,low_memory=False,usecols=lambda c: not c.startswith('Unnamed:'))