问题描述
我正试图每天抓取这个网站。 https://www.basketball-reference.com/boxscores/?month=9&day=30&year=2020 如您所见,日期的格式不正确,我不知道如何让R将其识别为日期。
到目前为止,这是我的代码:
url <- "https://www.basketball-reference.com/Boxscores/"
timevalues <- seq(as.Date("month=10&day=2&year=2020"),as.Date("month=10&day=2&year=2020"),by = "day")
head(timevalues)
charToDate(x)错误:字符串不是标准的明确格式
解决方法
您不能像这样生成序列(或日期)。这是使用lubridate软件包的解决方案
library(lubridate)
url <- "https://www.basketball-reference.com/boxscores/"
my_dates <- seq(as.Date("2020-09-25"),as.Date("2020-10-05"),by = "day")
urls <- paste0(url,"?month=",month(my_dates),"&day=",day(my_dates),"&year=",year(my_dates))
urls
#> [1] "https://www.basketball-reference.com/boxscores/?month=9&day=25&year=2020"
#> [2] "https://www.basketball-reference.com/boxscores/?month=9&day=26&year=2020"
#> [3] "https://www.basketball-reference.com/boxscores/?month=9&day=27&year=2020"
#> [4] "https://www.basketball-reference.com/boxscores/?month=9&day=28&year=2020"
#> [5] "https://www.basketball-reference.com/boxscores/?month=9&day=29&year=2020"
#> [6] "https://www.basketball-reference.com/boxscores/?month=9&day=30&year=2020"
#> [7] "https://www.basketball-reference.com/boxscores/?month=10&day=1&year=2020"
#> [8] "https://www.basketball-reference.com/boxscores/?month=10&day=2&year=2020"
#> [9] "https://www.basketball-reference.com/boxscores/?month=10&day=3&year=2020"
#> [10] "https://www.basketball-reference.com/boxscores/?month=10&day=4&year=2020"
#> [11] "https://www.basketball-reference.com/boxscores/?month=10&day=5&year=2020"
,
我们可以使用glue
来创建
library(lubridate)
url <- "https://www.basketball-reference.com/boxscores/"
my_dates <- seq(as.Date("2020-09-25"),by = "day")
urls <- glue::glue("{url}?month={month(my_dates)}&day={day(my_dates)}","&year={year(my_dates)}")
-输出
#https://www.basketball-reference.com/boxscores/?month=9&day=25&year=2020
#https://www.basketball-reference.com/boxscores/?month=9&day=26&year=2020
#https://www.basketball-reference.com/boxscores/?month=9&day=27&year=2020
#https://www.basketball-reference.com/boxscores/?month=9&day=28&year=2020
#https://www.basketball-reference.com/boxscores/?month=9&day=29&year=2020
#https://www.basketball-reference.com/boxscores/?month=9&day=30&year=2020
#https://www.basketball-reference.com/boxscores/?month=10&day=1&year=2020
#https://www.basketball-reference.com/boxscores/?month=10&day=2&year=2020
#https://www.basketball-reference.com/boxscores/?month=10&day=3&year=2020
#https://www.basketball-reference.com/boxscores/?month=10&day=4&year=2020
#https://www.basketball-reference.com/boxscores/?month=10&day=5&year=2020