Python-在特定行读取标题

问题描述

我正在尝试读取具有多个标题的文本文件 - 但标题从第 1000 行开始。例如,我的标题如下所示:

LN 类型 Pct 金额本金
欠款 TP 代码

因此,如您所见,我的标题自动换行的,并从文本文件的第 1000 行开始。如何将其导入 Python?所以这是识别我的标题和列?

到目前为止我的代码

topheader='Acct Total'
with open('1000.txt') as f:
    for num,line in enumerate(f,1):
        if topheader in in line:
            df = pd.read_csv('1000.txt',header=[num,next()] #I knw this is incorrect,but I need help

每次“Acct Total”出现在文件中(第 999 行)时,标题位于下一行(第 1000 行)。如何让 Python 读取第 1000 行的标题,并识别出标题自动换行的?

解决方法

像下面这样的东西可能对你有用。 StringIO 只是让字符串表现得像一个文件。这只是为了使此代码片段可运行。

from io import StringIO  # just for example

text = """#
#
#
#
#
#
#
#
LN Type Pct Amount Principal
TP Code Due Owed
1 2 3 4
5 6 7 8
9 1 2 3
4 5 6 7
8 9 1 0"""

f = StringIO(text)

while True:
    line = f.readline()
    line = line.strip()
    if line.startswith("LN"):
        break  #find where the columns start
line2 = f.readline()  # get the next row
# construct column names
names = [f"{a}_{b}" for a,b in zip(line.split(),line2.split())]

# file is now at the start of the data,so pandas will start reading from there
# pass in the column names explicitly
# read_table and read_csv have similar call signatures
df=pd.read_table(f,header=None,sep=" ",names=names)
print(df)

输出:

   LN_TP  Type_Code  Pct_Due  Amount_Owed
0      1          2        3            4
1      5          6        7            8
2      9          1        2            3
3      4          5        6            7
4      8          9        1            0