我正在尝试从以下网页抓取数据:https://skiplagged.com/flights/YTO/DXB/2020-08-21。
我要定位的元素如下:div[@class='infinite-trip-list']//div[@class='span1 trip-duration']
这是一个在用户滚动时动态添加元素的列表。我的目标是将这些元素存储在变量中,以提取每次飞行的持续时间。到目前为止,我还无法做到这一点,而这是我在阅读有关此类问题的几篇Stackoverflow帖子后尝试的。
mylist = []
last = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
time.sleep(1) #let the page load
new = driver.execute_script("return document.body.scrollHeight")
infinite_list = driver.find_elements_by_xpath("//div[@class='infinite-trip-list']//div[@class='span1 trip-duration']")
for elem in infinite_list:
if elem not in mylist:
mylist.append(elem.text)
if new == last: #if new height is equal to last height then we have reached the end of the page so end the while loop
break
last = new #changing last's value to new