如何使用Selenium来存储有关在滚动上动态添加的div的信息？

问题描述

我目前正在尝试从网站 skiplagged.com 中抓取数据。我目前正在尝试抓取其数据的网页上有很多信息对我来说是有益的。

例如，假设我们在网页上有关于100个航班的信息。尽管定位了正确的要素来获取信息，但我仅设法获得了与10个航班有关的信息。

后来我意识到随着我不断向下滚动网页，添加了新的div。结果，我无法抓取网页的其余部分。

更早尝试

我还访问了此链接，以尝试转到页面末尾以最终抓取数据：How to scroll to the end of the page using selenium in Python?

但是，我的尝试只失败了。

链接到我正在尝试从中抓取数据的网页

https://skiplagged.com/flights/YTO/DXB/2020-08-21

我的Python代码

infinite_list = driver.find_elements_by_xpath("//div[@class='infinite-trip-list']//div[@class='span1 trip-duration']")
for elem in infinite_list:
    print(elem.text)

解决方法

您可以稍作调整就使用execute_script

time.sleep(3) #sleeping to let the page load
driver.execute_script("window.scrollTo(0,document.body.scrollHeight);") #scrolling down to the end

您还可以无限滚动直到页面结束

last = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    time.sleep(1) #let the page load
    new = driver.execute_script("return document.body.scrollHeight")
    if new == last: #if new height is equal to last height then we have reached the end of the page so end the while loop
        break
    last = new #changing last's value to new

python selenium selenium-webdriver web-scraping