如何使用python请求抓取非restful API?

问题描述

我正在尝试抓取此 website。我从 Google 开发者工具的“网络”选项卡中发现,对 URL https://tncovidbeds.tnega.org/api/hospitals 的名为 hospitals 的请求具有我需要的响应。

我尝试在我的 Python 代码中使用相同的标头和有效负载重新创建相同的情况,但得到的响应与网站的响应不同。

这是我的python代码

import requests

url = r'https://tncovidbeds.tnega.org/api/hospitals'

d = {
"searchString":"","sortCondition":{"Name":1},"pageNumber":1,"pageLimit":10,"SortValue":"Availability","districts":["5ea0abd3d43ec2250a483a4f"],"browserId":"b4c5b065a84c7d2b60e8b23d415b2c3a","IsGovernmentHospital":"true","IsPrivateHospital":"true","FacilityTypes":["CHO","CHC","CCC"]
}

h = {
"authority": "tncovidbeds.tnega.org","method": "POST","path":"/api/hospitals","scheme": "https","accept": "application/json,text/plain,*/*","accept-encoding": "gzip,deflate,br","accept-language": "en-US,en;q=0.9","cache-control": "no-cache","content-length": "280","content-type": "application/json;charset=UTF-8","cookie": "_ga=GA1.2.1066740172.1620653373; _gid=GA1.2.1460220464.1620653373","origin": "https://tncovidbeds.tnega.org","pragma": "no-cache","sec-ch-ua": '" Not A;Brand";v="99","Chromium";v="90","Google Chrome";v="90"',"sec-ch-ua-mobile": "?0","sec-fetch-dest": "empty","sec-fetch-mode": "cors","sec-fetch-site": "same-origin","token": "null",}

res = requests.post(url,data=d,headers=h)
print(res.json())

我得到的输出是:

{
'result': None,'exception': '','pagination': None,'statusCode': '500','errors': [],'warnings': []
}

我需要的响应以及来自 Google 网络选项卡的响应是:

{
"result": A BIG LIST OF JSON OBJECTS,"exception":null,"pagination":{"pageNumber":1,"skipCount":0,"totalCount":155},"statusCode":"200","errors":[],"warnings":[]}

你能给我建议一个解决方案吗。

提前致谢。

解决方法

正如我从您的浏览器请求中看到的,content-type 必须是 application/json;charset=UTF-8。将负载作为 data 参数传递时,会请求 will create 一个 application/x-www-form-urlencoded 请求。要解决此问题,您需要将有效负载作为 json 参数传递。它会自动设置正确的 content-type

requests.post(url,json=d)

此外,在您的情况下,您不需要为请求提供任何额外的标头。