尝试使用Selenium WebDriver提取数据时出现问题

问题描述

您好,我正在尝试提取此网页的几率: https://www.unibet.fr/sport/football

这是我的python脚本:

#!/usr/bin/python3
# -*- coding: utf­-8 ­-*-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import webdriverwait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import numpy as np
import os

options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandBox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)

driver.get('https://www.unibet.fr/sport/football')

odds = [my_elem.text for my_elem in webdriverwait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,'//span[contains(@class,"ui-touchlink-needsclick price odd-price")]')))]

print(odds,'\n')

driver.close()
driver.quit()

输出使我知道:

Traceback (most recent call last):
  File "./azerty.py",line 26,in <module>
    odds = [my_elem.text for my_elem in webdriverwait(driver,"ui-touchlink-needsclick price odd-price")]')))]
  File "/usr/local/lib/python3.8/dist-packages/selenium/webdriver/support/wait.py",line 80,in until
    raise TimeoutException(message,screen,stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

此脚本可以与其他网页完美运行,但在这种情况下则不能。 一些帮助,谢谢

解决方法

您正在超时的原因是您的页面正在无限加载页面,即,如果您进入页面底部,则会加载新元素。现在,即使DOM中几乎没有元素,硒也无法找到它们,结果是您超时了。尝试先加载所有元素,然后找到它们。

driver.get('https://www.unibet.fr/sport/football')
WebDriverWait(driver,60).until(EC.presence_of_element_located((By.XPATH,'//a[@data-track-action="start_page"]'))) #Wait for page to load

#Scroll till page is loaded completely
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    # Wait to load page
    time.sleep(2)
    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height
#Get the elements
odds = [my_elem.text for my_elem in driver.find_elements_by_xpath( '//span[contains(@class,"ui-touchlink-needsclick price odd-price")]')]
print(odds,'\n')

driver.close()
driver.quit()

投放:

enter image description here