问题描述
我对 Python 非常陌生,并试图弄清楚以下几点:
我有多个 CSV 文件(每月文件),我试图将它们合并为一个年度文件。每月文件都有标题,所以我试图保留第一个标题并删除其余的。我使用了以下脚本来完成此操作,但是每个月之间有 10 个空白行。
import shutil
import glob
#import csv files from folder
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort() # glob lacks reliable ordering,so impose your own if output order matters
with open('someoutputfile.csv','wb') as outfile:
for i,fname in enumerate(allFiles):
with open(fname,'rb') as infile:
if i != 0:
infile.readline() # Throw away header on all but first file
# Block copy rest of file from input to output without parsing
shutil.copyfileobj(infile,outfile)
print(fname + " has been imported.")
先谢谢你!
解决方法
假设数据集不大于您的内存,我建议读取 Pandas 中的每个文件,连接数据帧并从那里过滤。空白行可能会显示为 nan。
import pandas as pd
import glob
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()
df = pd.Dataframe()
for i,fname in enumerate(allFiles):
#append data to existing dataframe
df = df.append(pd.read(fname),ignore_index = True)
#hopefully,this will drop blank rows
df = df.dropna(how = 'all')
#write to file
df.to_csv('someoutputfile.csv')