问题描述
我正在关注此网站,以从IMDB获取信息:https://www.analyticsvidhya.com/blog/2017/03/beginners-guide-on-web-scraping-in-r-using-rvest-with-hands-on-knowledge/
但是,IMDB中缺少某些数据。该网站建议进行外观检查并编写如下功能:
for (i in c(39,73,80,89)){
a<-Metascore_data[1:(i-1)]
b<-Metascore_data[i:length(Metascore_data)]
Metascore_data<-append(a,list("NA"))
Metascore_data<-append(Metascore_data,b)
}
我想知道是否有更好的方法以编程方式处理此问题?
解决方法
以下对我有用:
library(rvest)
URL <- 'https://www.imdb.com/search/title/?title_type=feature&online_availability=US/IMDbTV&start=1251&ref_=adv_nxt'
webpage <- read_html(URL)
genres <- webpage %>%
html_nodes('span.genre') %>%
html_text() %>%
trimws()
这将返回50个值:
genres
# [1] "Comedy,Romance" "Action,Crime,Drama"
# [3] "Action,Horror,Sci-Fi" "Action,Adventure,Thriller"
# [5] "Adventure,Comedy,Family" "Comedy"
# [7] "Action,Thriller" "Comedy,Drama,Romance"
# [9] "Comedy" "Comedy"
#[11] "Action,Drama" "Action,Thriller"
#[13] "Action,Thriller" "Mystery,Thriller"
#[15] "Crime,Thriller" "Drama,Horror"
#[17] "Animation,War" "Drama,Thriller"
#[19] "Action,Drama" "Drama,Sci-Fi"
#[21] "Adventure,Family" "Crime,Drama"
#[23] "Action,Thriller" "Action,Sci-Fi"
#[25] "Thriller" "Comedy,Crime"
#[27] "Comedy,Biography,Drama"
#[29] "Adventure,Comedy" "Crime,Thriller"
#[31] "Drama,Sci-Fi,Thriller" "Comedy,Romance"
#[33] "Action,Thriller" "Action,Sci-Fi"
#[35] "Action,Drama" "Action,Drama"
#[37] "Action,Thriller" "Action,War"
#[39] "Drama,Thriller" "Animation,Family"
#[41] "Drama,Romance" "Action,Fantasy"
#[43] "Action,Fantasy" "Comedy,Drama"
#[45] "Action,Sci-Fi"
#[47] "Drama,Romance" "Animation,Family,Fantasy"
#[49] "Action,Fantasy" "Mystery,Thriller"