如何在 PYTHON 中加快将 DBF 文件读取到 Dataframe 的速度？

问题描述

我正在使用以下例程 dbf2DF (https://gist.github.com/ryan-hill/f90b1c68f60d12baea81) 将 .dbf 文件读入数据帧。

import pysal as ps
import pandas as pd
'''
Arguments
---------
dbfile  : DBF file - Input to be imported
upper   : Condition - If true,make column heads upper case
'''

    def dbf2DF(dbfile,upper=True): #Reads in DBF files and returns Pandas DF
        db = ps.open(dbfile) #Pysal to open DBF
        d = {col: db.by_col(col) for col in db.header} #Convert dbf to dictionary
        #pandasDF = pd.DataFrame(db[:]) #Convert to Pandas DF
        pandasDF = pd.DataFrame(d) #Convert to Pandas DF
        if upper == True: #Make columns uppercase if wanted 
            pandasDF.columns = map(str.upper,db.header) 
        db.close() 
        return pandasDF

虽然它可以满足我的要求，但速度很慢 - 170 万条记录需要 56 秒。

其中，处理以下行需要 54 秒：

d = {col: db.by_col(col) for col in db.header} #Convert dbf to dictionary

我的问题是 - 我们可以通过消除“for”循环来加快这条线吗？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

dataframe dbf pandas vectorization