使用 Selenium 防止网站下载

问题描述

我使用 Selenium 和 Colab 下载卖家数据。我已经几天无法下载网站内容了。

卖家的详细信息在隐身模式下可见。在普通模式下,我必须登录才能看到数据。

如何处理?

我的代码

# install chromium,its driver,and selenium
!apt update
!apt install chromium-chromedriver
!pip install selenium
!pip install dnspython
!pip install pipedrive-python-lib
# set options to be headless
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import webdriverwait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd
import pymongo
from pymongo import MongoClient
from datetime import date
from pipedrive.client import Client

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandBox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--incognito')

url = 'https://allegro.pl/oferta/dr-coffee-f12-big-plus-ekspres-do-kawy-10196811305#aboutSeller'

wd = webdriver.Chrome(options=options)
wd.get(url)
wd.maximize_window()

soup = BeautifulSoup(wd.page_source,'html.parser')

汤的内容

<html><head><title>allegro.pl</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0% 
{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style><Meta content="width=device-width,initial-scale=1.0" name="viewport"/></head><body style="margin:0"><script>var dd= 
{'cid':'AHrlqAAAAAMAPz5ltF-0LmMAI-N- 
0w==','hsh':'77DC0FFBAA0B77570F6B414F8E5BDB','t':'bv','s':29560,'host':'geo.captcha- 
delivery.com'}</script><script src="https://ct.captcha-delivery.com/c.js"></script> 
<script>if("string"==typeof navigator.userAgent&&navigator.userAgent.indexOf("Firefox")>-1) 
{var isIframeLoaded=!1,maxTimeoutMs=5e3;function iframeOnload(e){isIframeLoaded=!0;var 
a=document.getElementById("noiframe");a&&a.parentNode.removeChild(a)}var initialTime=(new 
Date).getTime();setTimeout(function(){isIframeLoaded||(new Date).getTime()- 
initialTime>maxTimeoutMs&&(document.body.innerHTML='<div id="noiframe">Please enable JS and 
disable any ad blocker</div>'+document.body.innerHTML)},maxTimeoutMs)}else function 
iframeOnload(){}</script><iframe border="0" frameborder="0" height="100%" 
onload="iframeOnload()" scrolling="yes" src="https://geo.captcha-delivery.com/captcha/? 
initialCid=AHrlqAAAAAMAPz5ltF-0LmMAI-N- 
0w%3D%3D&amp;hash=77DC0FFBAA0B77570F6B414F8E5BDB&amp;cid=T-jqitz6Xj5IAh.rlEft_uW8shQiyEx- 
q0h3fbxjp7ibFQdeKxAG4O8mJHUbhP_2L.dCWU9ZNi0VhRHr-_84zpxnvkfMwS- 
X8HKiPK8cue&amp;t=bv&amp;referer=https%3A%2F%2Fallegro.pl%2Foferta%2fdr-coffee-f12-big-plus- 
ekspres-do-kawy-10196811305&amp;s=29560" style="height:100vh;" width="100%"></iframe>
</body></html>

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)