问题描述
如何转换多个如下所示的HTML表:
var options = {
//select: '-id',populate: {
path: 'article',populate: {
path: 'article.author',}
},lean: true,page: pageNumber,limit: 18
};
PS4
Game Name | Price
GoW | 49.99
FF VII R | 59.99
XBX
Game Name | Price
Gears 5 | 49.99
Forza 5 | 59.99
插入这样的json对象:
<table>
<tr colspan="2">
<td>PS4</td>
</tr>
<tr>
<td>Game Name</td>
<td>Price</td>
</tr>
<tr>
<td>GoW</td>
<td>49.99</td>
</tr>
<tr>
<td>FF VII R</td>
<td>59.99</td>
</tr>
</table>
<table>
<tr colspan="2">
<td>XBX</td>
</tr>
<tr>
<td>Game Name</td>
<td>Price</td>
</tr>
<tr>
<td>Gears 5</td>
<td>49.99</td>
</tr>
<tr>
<td>Forza 5</td>
<td>59.99</td>
</tr>
</table>
我尝试使用pandas.read_html(path / to / file)加载包含表的html文件,它确实返回了DataFrames列表,但此后我不知道如何提取数据,尤其是该平台名称在标题中,而不是单独的列。
我使用熊猫是因为我是从本地htm文件中提取那些表的,这些文件包含其他形式的表和HTML代码,所以我使用:
[
{ "Game Name": "Gow","Price": "49.99","platform": "PS4"},{ "Game Name": "FF VII R","Price": "59.99",{ "Game Name": "Gears 5","platform": "XBX"},{ "Game Name": "Forza 5","platform": "XBX"}
]
解决方法
id | name | action
-----------------------
1 | New York | 1
2 | Boston | 3
3 | Dallas | 2
4 | Boston | 4
5 | New York | 2
6 | Chicago | 5
7 | Dallas | 6