R中的Web抓取-获取“记录中的错误[[x]] ....提供的元素多于要替换的元素”

问题描述

请注意,我对R和R本身的Web抓取还很陌生,因此在解释响应时,请注意这一点...

我正试图通过网络抓取入住日期,评论标题评论

这是我要生成的URL列表的地方:

library(rvest)
#GENErating THE URLS
webpage_list <- vector(mode = "list")
#creating empty list
webpage_list

for(n in seq(from=5,to=15,by=5)){
  webpage_list[[n]] <- glue::glue("https://www.sampleURL.com#REVIEWS")
}

#droping the empty values
webpage_list[sapply(webpage_list,is.null)] <- NULL
webpage_list

然后将列表转换为字符向量,并通过反复循环来开始确定我要抓取的网页区域

webpage_list2 <- unlist(webpage_list)
class(webpage_list2)

for(i in seq_along(webpage_list2)){
  webpage <- read_html(webpage_list2[i])

  results <- webpage %>% html_nodes(".oETBfkHU,._3hDPbqWO")
  print(results)

  # Building the dataset
  records <- vector("character",length = (length(results)))
  print(records)
}

直到现在我似乎都可以按照我的意愿工作

for (x in seq_along(results)) {
    url <- read_html(webpage_list2[x])
    dateOfStay <- str_c(url %>% 
                          html_nodes("._34Xs-BQm") %>% 
                          html_text())
    reviewTitle <- str_sub(url %>%
                             html_nodes(".glasR4aX")%>%
                             html_text())
    review <- str_sub(url %>%
                        html_nodes(".irsGHoPm") %>%
                        html_text())
    records[[x]] <- data_frame(dateOfStay = dateOfStay,reviewTitle = reviewTitle,review = review)#,review = review
  }
#Build DF
DF <- bind_rows(records)

由此,我得到以下错误

Error in records[[x]] <- data_frame(dateOfStay = dateOfStay,:    more elements supplied than there are to replace

任何帮助将不胜感激,也请注意,我对R和R本身的Web抓取还很陌生,因此在解释响应时,请注意这一点。

解决方法

无需抓取,我们就能找到您的问题。您正在尝试将数据帧放在字符向量内。数据框不是字符。所以这是错误的尺寸。您可以通过将记录制成列表来解决它,也可以将数据框包装在列表中以将其强制为单个项目。我建议将记录设为列表。

records <- vector("character",length = (3))
records[[2]] <- data.frame(test = "A",test2 = "B")
# Error in records[[2]] <- data.frame(test = "A",test2 = "B") : 
#   more elements supplied than there are to replace

# Option 1:
records <- list(length = (3))                  
records[[2]] <- data.frame(test = "A",test2 = "B")
records
# $`length`
# [1] 3
# 
# [[2]]
#   test test2
# 1    A     B


# Option 2:
records <- vector("character",length = (3))
records[[2]] <- list(data.frame(test = "A",test2 = "B"))
# records
# [[1]]
# [1] ""
# 
# [[2]]
#   [[2]][[1]]
#   test test2
# 1    A     B
# 
# 
# [[3]]
# [1] ""