Python 合并 CSV,删除标题并删除空格

问题描述

我对 Python 非常陌生,并试图弄清楚以下几点:

我有多个 CSV 文件(每月文件),我试图将它们合并为一个年度文件。每月文件都有标题,所以我试图保留第一个标题删除其余的。我使用了以下脚本来完成此操作,但是每个月之间有 10 个空白行

有谁知道我可以添加什么来删除空白行?

import shutil
import glob


#import csv files from folder
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()  # glob lacks reliable ordering,so impose your own if output order matters
with open('someoutputfile.csv','wb') as outfile:
    for i,fname in enumerate(allFiles):
        with open(fname,'rb') as infile:
            if i != 0:
                infile.readline()  # Throw away header on all but first file
            # Block copy rest of file from input to output without parsing
            shutil.copyfileobj(infile,outfile)
            print(fname + " has been imported.")     

先谢谢你!

解决方法

假设数据集不大于您的内存,我建议读取 Pandas 中的每个文件,连接数据帧并从那里过滤。空白行可能会显示为 nan。

import pandas as pd
import glob
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()
df = pd.Dataframe()
for i,fname in enumerate(allFiles):
    #append data to existing dataframe
    df = df.append(pd.read(fname),ignore_index = True)
#hopefully,this will drop blank rows
df = df.dropna(how = 'all')
#write to file
df.to_csv('someoutputfile.csv')