本篇代码提供者: 青灯教育-巳月老师
知识点:
- 动态数据抓包
- requests发送请求
- json数据解析
开发环境:
运行代码
- python 3.8
辅助敲代码
- pycharm 2021.2
第三方模块
- requests
如果安装python第三方模块:
- win + R 输入 cmd 点击确定, 输入安装命令 pip install 模块名 (pip install requests) 回车
- 在pycharm中点击Terminal(终端) 输入安装命令
如何配置pycharm里面的python解释器?
-
选择file(文件) >>> setting(设置) >>> Project(项目) >>> python interpreter(python解释器)
-
点击齿轮, 选择add
pycharm如何安装插件?
-
点击 Marketplace 输入想要安装的插件名字 比如:翻译插件 输入 translation / 汉化插件 输入 Chinese
-
选择相应的插件点击 install(安装) 即可
-
安装成功之后 是会弹出 重启pycharm的选项 点击确定, 重启即可生效
基本思路流程(通用)
- 发送请求
- 获取数据
- 解析数据
- 保存数据
代码
导入模块
import requests # 发送请求 访问网站
import re
加入伪装
url = 'https://www.kuaishou.com/graphql'
# 伪装
headers = {
'content-type': 'application/json',
'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_ea128125517a46bd491ae9ccb255e242; client_key=65890b29; didv=1646739254078; _bl_uid=pCldq3L00L61qCzj6fytnk2wmhz5; userId=270932146; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABJBZGcI4Czt3EqaG90aWFm2EPPYcAQlkuV3ZOkHUBcqEtWV--udx6stXFOhEkGgx4tNCBS9Vhl-GstWLkvn-r_eV1072IPsO8d5sUcUuTJv3nicPWVBcfHW813ST2a4uN7HyHsnpnRkjx2BFXomrmdsO4tbAgy3-3QRTaw05tiEp79qGRfQgVmK4kVJLOTF9X7o9vSjrLrTaRxEf0rwsXJhoSsmhEcimAl3NtJGybSc8y6sdlIiCkLEL3ZiZwp85TGjXIHaw7KGda_VNpfdZ1qigsOkLmESgFMAE; kuaishou.server.web_ph=ad8a744eb59aab3bf509625671ad16837e66',
'Host': 'www.kuaishou.com',
'Origin': 'https://www.kuaishou.com',
'Referer': 'https://www.kuaishou.com/search/video?searchKey=%E5%8F%8C%E9%A9%AC%E5%B0%BE',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
# 传数据
for page in range(1, 10):
json = {
'operationName': "visionSearchPhoto",
'query': "fragment photoContent on PhotoEntity {\n id\n duration\n caption\n likeCount\n viewCount\n realLikeCount\n coverUrl\n photoUrl\n photoH265Url\n manifest\n manifestH265\n videoResource\n coverUrls {\n url\n __typename\n }\n timestamp\n expTag\n animatedCoverUrl\n distance\n videoRatio\n liked\n stereoType\n profileUserTopPhoto\n __typename\n}\n\nfragment FeedContent on Feed {\n type\n author {\n id\n name\n headerUrl\n following\n headerUrls {\n url\n __typename\n }\n __typename\n }\n photo {\n ...photoContent\n __typename\n }\n canAddComment\n llsid\n status\n currentPcursor\n __typename\n}\n\nquery visionSearchPhoto($keyword: String, $pcursor: String, $searchSessionId: String, $page: String, $webPageArea: String) {\n visionSearchPhoto(keyword: $keyword, pcursor: $pcursor, searchSessionId: $searchSessionId, page: $page, webPageArea: $webPageArea) {\n result\n llsid\n webPageArea\n Feeds {\n ...FeedContent\n __typename\n }\n searchSessionId\n pcursor\n aladdinBanner {\n imgurl\n link\n __typename\n }\n __typename\n }\n}\n",
'variables': {'keyword': "双马尾", 'pcursor': str(page), 'page': "search", 'searchSessionId': "MTRfMjcwOTMyMTQ2XzE2NTU3MjcyNTU3NjFf5Y-M6ams5bC-XzUwMDQ"}
}
1. 发送请求
response = requests.post(url=url, headers=headers, json=json)
2. 获取数据
<Response [200]>: 请求成功
Feeds = response.json()['data']['visionSearchPhoto']['Feeds']
for Feed in Feeds:
author_id = Feed['author']['id']
photo_id = Feed['photo']['id']
print(author_id, photo_id)
caption = Feed['photo']['caption']
photoUrl = Feed['photo']['photoUrl']
print(caption, photoUrl)
caption = re.sub('[\\/:"?<>|\\n]', '', caption)
json_like = {
'operationName': "visionVideoLike",
'query': "mutation visionVideoLike($photoId: String, $photoAuthorId: String, $cancel: Int, $expTag: String) {\n visionVideoLike(photoId: $photoId, photoAuthorId: $photoAuthorId, cancel: $cancel, expTag: $expTag) {\n result\n __typename\n }\n}\n",
'variables': {
'cancel': 0,
'expTag': "1_i/2005246154318902369_xpcwebsearchxxnull0",
'photoAuthorId': author_id,
'photoId': photo_id
}
}
resp_ = requests.post(url=url, headers=headers, json=json_like)
# video_data = requests.get(photoUrl).content
# with open(f'video/{caption}.mp4', mode='wb') as f:
# f.write(video_data)
尾语
好了,我的这篇文章写到这里就结束啦!
有更多建议或问题可以评论区或私信我哦!一起加油努力叭(ง •_•)ง