问题描述
我正在尝试使用 selenium 从以下网站抓取表格: https://web.archive.org/web/20120220031809/http://simcentral.net/ibaf/games/1
代码:
from selenium import webdriver as wd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import webdriverwait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
from pandas.io.html import read_html
import pandas as pd
import numpy as np
import re
os.chdir('c:/Users/Owner')
bat=pd.DataFrame()
driver = wd.Chrome()
wait = webdriverwait(driver,15)
driver.get('https://web.archive.org/web/20120220031809/http://simcentral.net/ibaf/games/1')
page=driver.find_element_by_xpath('//*[contains(concat( " ",@class," " ),concat( " ","regtext"," " ))] | //*[contains(concat( " ","normal"," " ))]')
table_html=page.get_attribute('innerHTML')
driver.quit()
我收到以下错误:
StaleElementReferenceException: stale element reference: element is not attached to the page document
我在网上查看并了解问题,但不知道该怎么办。其他问题似乎是通过 xpath 以外的其他方式拉动元素。我知道它在 table_html=
行停止工作,因为如果我删除它,上面的任何内容都可以正常工作并且浏览器会按预期关闭。
感谢您的帮助。
解决方法
试试这个
import pandas as pd
data = pd.read_html("https://web.archive.org/web/20120220031809/http://simcentral.net/ibaf/games/1")