问题描述
我正在尝试读取具有多个标题的文本文件 - 但标题从第 1000 行开始。例如,我的标题如下所示:
LN 类型 Pct 金额本金
欠款 TP 代码
因此,如您所见,我的标题是自动换行的,并从文本文件的第 1000 行开始。如何将其导入 Python?所以这是识别我的标题和列?
到目前为止我的代码:
topheader='Acct Total'
with open('1000.txt') as f:
for num,line in enumerate(f,1):
if topheader in in line:
df = pd.read_csv('1000.txt',header=[num,next()] #I knw this is incorrect,but I need help
每次“Acct Total”出现在文件中(第 999 行)时,标题位于下一行(第 1000 行)。如何让 Python 读取第 1000 行的标题,并识别出标题是自动换行的?
解决方法
像下面这样的东西可能对你有用。 StringIO 只是让字符串表现得像一个文件。这只是为了使此代码片段可运行。
from io import StringIO # just for example
text = """#
#
#
#
#
#
#
#
LN Type Pct Amount Principal
TP Code Due Owed
1 2 3 4
5 6 7 8
9 1 2 3
4 5 6 7
8 9 1 0"""
f = StringIO(text)
while True:
line = f.readline()
line = line.strip()
if line.startswith("LN"):
break #find where the columns start
line2 = f.readline() # get the next row
# construct column names
names = [f"{a}_{b}" for a,b in zip(line.split(),line2.split())]
# file is now at the start of the data,so pandas will start reading from there
# pass in the column names explicitly
# read_table and read_csv have similar call signatures
df=pd.read_table(f,header=None,sep=" ",names=names)
print(df)
输出:
LN_TP Type_Code Pct_Due Amount_Owed
0 1 2 3 4
1 5 6 7 8
2 9 1 2 3
3 4 5 6 7
4 8 9 1 0