如何使用R刮擦此表中的信息?

问题描述

我正在尝试抓取以下网页:https://www.timeanddate.com/weather/sweden/stockholm/historic?month=3&year=2020 我对最后的表格感兴趣,位于“ ...的斯德哥尔摩天气历史记录”下方

使用提交的代码,我可以在当月的第一天获得信息,但是在接下来的日子里,我不知道如何获取信息。如果我在下拉列表中更改日期,则网址不会更改。 我该如何在一个月的所有天中刮擦这张桌子?

library(tidyverse)
library(rvest)
library(RSelenium)
library(stringr)
library(dplyr)
rD <- rsDriver(browser="chrome",port=4234L,chromever ="85.0.4183.83")
remDr <- rD[["client"]]
remDr$navigate("https://www.timeanddate.com/weather/sweden/stockholm/historic?month=3&year=2020")
webElems <- remDr$findElements(using="class name",value="sticky-wr")
s<-webElems[[1]]$getElementText()
s<-as.character(s)
print(s)

解决方法

看起来您可以使用rvest本身提取表,而在这里不需要Rselenium。不过,桌子可能需要清洗。

library(rvest)
url <- 'https://www.timeanddate.com/weather/sweden/stockholm/historic?month=3&year=2020'

url %>%
  read_html() %>%
  html_table() %>%
  .[[3]] %>% 
  setNames(.[1,]) -> tmp

tmp[-c(1,nrow(tmp)),]

#             Time  Temp                    Weather    Wind   Humidity Barometer Visibility
#2  0:20.Aha 01 Mac  2 °C Light rain. Mostly cloudy. 20 km/h ↑      93%  988 mbar       5 km
#3            0:50.  2 °C       Drizzle. Low clouds. 13 km/h ↑      93%  988 mbar        N/A
#4            1:20.  2 °C       Drizzle. Low clouds. 15 km/h ↑     100%  987 mbar       9 km
#5            1:50.  2 °C       Drizzle. Low clouds. 15 km/h ↑     100%  987 mbar       8 km
#6            2:20.  2 °C    Light rain. Low clouds. 19 km/h ↑     100%  986 mbar       6 km
#7            2:50.  2 °C    Light rain. Low clouds. 19 km/h ↑     100%  985 mbar       4 km
#...

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...