问题描述
我有一个功能可以从网站中提取一行文本,然后将该文本递归添加到url
中。我的问题是,当它打印出所需的输出时,我想将该输出放入pandas DataFrame中,在其中可以清理数据并进行一些分析。
到目前为止,这是我的代码:
from urllib import request
def get_chunk(chunk,url='https://www.uchicago.computer/api.PHP?file='):
with request.urlopen(url + chunk) as f:
return f.read().decode('UTF-8').strip()
if __name__ == '__main__':
chunk = 'insertsixtyfourrandomcharactershereabcdefghijklmnopqrstuvyxyz123'
while chunk[-3:] != "END":
chunk = get_chunk(chunk[-64:])
print(chunk)
{"Last Name": "DOE","First Name": "JOHN","Job Title": "EXEC SECRETARY/OFFICE MGR","2020 Annual Salary": "100,000.00"}
RTBFRequest: John Doe
{"Last Name": "JANE","First Name": "MARY","Job Title": "CHIEF OF STAFF","2020 Annual Salary": "11,111.11"}
....
但是我想将输出返回到这样的数据帧中,其中输出的每一行都是其自己的行。
|Entry |
|-----------------------|
|"Last Name": "DOE"... |
|"RTBFRequest: John"... |
|"Last Name": "JANE"... |
|.... |
我尝试定义一个空列表,将chunk
附加到该列表,然后返回该列表,以便可以将输出添加到DataFrame中,但它只会返回输出的一小部分。
任何帮助将不胜感激!
解决方法
这应该有效:
import numpy as np
import pandas as pd
data = [{"Last Name": "DOE","First Name": "JOHN","Job Title": "EXEC SECRETARY/OFFICE MGR","2020 Annual Salary": "100,000.00"},{"Last Name": "JANE","First Name": "MARY","Job Title": "CHIEF OF STAFF","2020 Annual Salary": "11,111.11"}]
df = pd.Series(np.array(data)).to_frame('Entry')
结果
Entry
0 {'Last Name': 'DOE','First Name': 'JOHN','Jo...
1 {'Last Name': 'JANE','First Name': 'MARY','J...