如何获取函数的打印输出并将其放入DataFrame

问题描述

我有一个功能可以从网站中提取一行文本,然后将该文本递归添加url中。我的问题是,当它打印出所需的输出时,我想将该输出放入pandas DataFrame中,在其中可以清理数据并进行一些分析。

到目前为止,这是我的代码

from urllib import request

def get_chunk(chunk,url='https://www.uchicago.computer/api.PHP?file='):
    with request.urlopen(url + chunk) as f:
        return f.read().decode('UTF-8').strip()

if __name__ == '__main__':
    chunk = 'insertsixtyfourrandomcharactershereabcdefghijklmnopqrstuvyxyz123'
    while chunk[-3:] != "END":
        chunk = get_chunk(chunk[-64:])
        print(chunk)

输出显示为:

{"Last Name": "DOE","First Name": "JOHN","Job Title": "EXEC SECRETARY/OFFICE MGR","2020 Annual Salary": "100,000.00"}
RTBFRequest: John Doe
{"Last Name": "JANE","First Name": "MARY","Job Title": "CHIEF OF STAFF","2020 Annual Salary": "11,111.11"}
....

但是我想将输出返回到这样的数据帧中,其中输出的每一行都是其自己的行。

|Entry                  |         
|-----------------------|
|"Last Name": "DOE"...  |
|"RTBFRequest: John"... |
|"Last Name": "JANE"... |
|....                   |

我尝试定义一个空列表,将chunk附加到该列表,然后返回该列表,以便可以将输出添加到DataFrame中,但它只会返回输出的一小部分。

>

任何帮助将不胜感激!

解决方法

这应该有效:

import numpy as np
import pandas as pd
data = [{"Last Name": "DOE","First Name": "JOHN","Job Title": "EXEC SECRETARY/OFFICE MGR","2020 Annual Salary": "100,000.00"},{"Last Name": "JANE","First Name": "MARY","Job Title": "CHIEF OF STAFF","2020 Annual Salary": "11,111.11"}]
df = pd.Series(np.array(data)).to_frame('Entry')

结果

                                               Entry
0  {'Last Name': 'DOE','First Name': 'JOHN','Jo...
1  {'Last Name': 'JANE','First Name': 'MARY','J...