问题描述
我正在抓取该页面https://www.betexplorer.com/soccer/russia/premier-league-2014-2015/results/,但有时浏览器无法加载该页面或无法访问网站。 我该如何解决这个问题?
home = 'https://www.betexplorer.com/soccer/russia/premier-league-2014-2015/results/'
driver.get(home)
for i in range(1):
scroll = driver.find_element_by_tag_name('body').send_keys(Keys.END)
l=driver.find_elements_by_xpath("//a[@class='in-match']")
urls=[]
for i in range(len(l)):
urls.append(l[i].get_attribute('href'))
for i in urls:
driver.get(i)
sleep(5)
date=webdriverwait(driver,20).until(EC.visibility_of_element_located((By.ID,'match-date'))).text
hometeam=webdriverwait(driver,20).until(EC.visibility_of_element_located((By.XPATH,'/html/body/div[4]/div[5]/div/div/div[1]/section/ul[2]/li[1]/figure/div/a/img'))).get_attribute("alt")
awayteam=webdriverwait(driver,'/html/body/div[4]/div[5]/div/div/div[1]/section/ul[2]/li[3]/figure/div/a/img'))).get_attribute("alt")
ft=webdriverwait(driver,'js-score'))).text
解决方法
我看到过类似的问题,涉及从https://www.betexplorer.com提取数据, 但它已被删除
问题看起来像这样
“”“ Web抓取元素列表 我会将匹配项,日期和结果逐行抓取到csv文件““
这是代码
import requests
from lxml import html
import pandas as pd
from pandas import ExcelWriter
url = 'https://www.betexplorer.com/soccer/russia/premier-league/results/'
site = 'https://www.betexplorer.com'
getr = requests.get(url)
src = html.fromstring(getr.content)
game = src.xpath("//td[@class='h-text-left']//a//@href")
ft = src.xpath("//td[@class='h-text-center']//a//text()")
date = src.xpath("//td[@class='h-text-right h-text-no-wrap']//text()")
games = list()
fts = list()
dates = list()
for (gm,ftt,dat) in zip(game,ft,date):
gm = site + gm
getg = requests.get(gm)
srr = html.fromstring(getg.content)
teams = srr.xpath('//span[@class="list-breadcrumb__item__in"]//text()')
for team in teams:
games.append(team)
fts.append(ftt)
dates.append(dat)
fullfile = pd.DataFrame({'Games': games,'Fts': fts,'Dates': dates})
writer = ExcelWriter('D:\\yourpath\\games.xlsx')
fullfile.to_excel(writer,'Sheet1',index=False)
writer.save()