问题描述
我是 Python 新手。 在 PyCharm 中,我写了这段代码:
import requests
from bs4 import BeautifulSoup
response = requests.get(f"https://www.google.com/search?q=fitness+wear")
soup = BeautifulSoup(response.text,'html.parser')
print(soup)
我在 pythonanywhere.com 上的脚本中使用了相同的代码,它运行良好。我已经尝试了很多我找到的解决方案,但结果总是一样的,所以现在我坚持使用它。
解决方法
我认为这应该有效:
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
url = f"https://www.google.com/search?q=fitness+wear"
headers = {
"referer":"referer: https://www.google.com/","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/89.0.4389.114 Safari/537.36"
}
s.post(url,headers=headers)
response = s.get(url,headers=headers)
soup = BeautifulSoup(response.text,'html.parser')
print(soup)
它使用请求会话和发布请求来创建任何初始 cookie(对此不完全确定),然后允许您抓取。
,如果您在浏览器中打开一个私人窗口并转到 google.com,您应该会看到相同的弹出窗口,提示您同意。这是因为您没有发送会话 cookie。
您有不同的选择来解决这个问题。 一种是直接发送您可以在网站上观察到的 cookie,如下所示:
import requests
cookies = {"CONSENT":"YES+shp.gws-20210330-0-RC1.de+FX+412",...}
resp = request.get(f"https://www.google.com/search?q=fitness+wear",cookies=cookies)
@Dimitriy Kruglikov 使用的解决方案更简洁,使用会话是与网站保持持久会话的好方法。
,Google 不会阻止您,您仍然可以从 HTML 中提取数据。
使用 cookie 不是很方便,使用带有 post 和 get 请求的 session 会导致更大的流量。
您可以使用 decompose()
或 extract()
BS4
方法删除此弹出窗口:
-
annoying_popup.decompose()
将完全销毁它及其内容。 Documentation。 -
annoying_popup.extract()
将创建另一个 html 树:一个以您用来解析文档的BeautifulSoup
对象为根,另一个以提取的标签为根。 Documentation。
之后,您可以刮除所需的一切,而无需将其移除。
看到我最近做的这个Organic Results extraction。它从 Google 搜索结果中抓取标题、摘要和链接。
或者,您可以使用来自 SerpApi 的 Google Search Engine Results API。查看Playground。
from serpapi import GoogleSearch
import os
params = {
"engine": "google","q": "fus ro dah","api_key": os.getenv("API_KEY"),}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
print(f"Title: {result['title']}\nSnippet: {result['snippet']}\nLink: {result['link']}\n")
输出:
Title: Skyrim - FUS RO DAH (Dovahkiin) HD - YouTube
Snippet: I looked around for a fan made track that included Fus Ro Dah,but the ones that I found were pretty bad - some ...
Link: https://www.youtube.com/watch?v=JblD-FN3tgs
Title: Unrelenting Force (Skyrim) | Elder Scrolls | Fandom
Snippet: If the general subtitles are turned on,it can be seen that the text for the Draugr's Unrelenting Force is misspelled: "Fus Rah Do" instead of the proper "Fus Ro Dah." ...
Link: https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)
Title: Fus Ro Dah | Know Your Meme
Snippet: Origin. "Fus Ro Dah" are the words for the "unrelenting force" thu'um shout in the game Elder Scrolls V: Skyrim. After reaching the first town of ...
Link: https://knowyourmeme.com/memes/fus-ro-dah
Title: Fus ro dah - Urban Dictionary
Snippet: 1. A dragon shout used in The Elder Scrolls V: Skyrim. 2.An international term for oral sex given by a female. ex.1. The Dragonborn yelled "Fus ...
Link: https://www.urbandictionary.com/define.php?term=Fus%20ro%20dah
JSON 的一部分:
"organic_results": [
{
"position": 1,"title": "Unrelenting Force (Skyrim) | Elder Scrolls | Fandom","link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)","displayed_link": "https://elderscrolls.fandom.com › wiki › Unrelenting_F...","snippet": "If the general subtitles are turned on,it can be seen that the text for the Draugr's Unrelenting Force is misspelled: \"Fus Rah Do\" instead of the proper \"Fus Ro Dah.\" ...","sitelinks": {
"inline": [
{
"title": "Location","link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Location"
},{
"title": "Effect","link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Effect"
},{
"title": "Usage","link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Usage"
},{
"title": "Word Wall","link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Word_Wall"
}
]
},"cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:K3LEBjvPps0J:https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)+&cd=17&hl=en&ct=clnk&gl=us"
}
]
免责声明,我为 SerpApi 工作。