问题描述
我正尝试使用以下代码从网站上抓取玩家信息:
#install required packages
if(!require(pacman))install.packages("pacman")
pacman::p_load('rvest','stringi','dplyr','tidyr','measurements','reshape2','foreach','doParallel','raster','curl','httr','Iso')
profile_detail<-read_html('https://www.pgatour.com/players/player.01006.john-adams.html#profile')%>%html_node("[class='s-header__bottom']")%>%html_children()
[1] <div class="s-header__no-data">No additional profile information available</div>
不确定如何访问“ s-col”的div类
有人可以帮我吗?
谢谢!
解决方法
您可以在div.s-col
中使用html_nodes
:
library(rvest)
url <- 'https://www.pgatour.com/players/player.06197.michael-allen.html'
url %>%
read_html() %>%
html_nodes('div.s-col') %>%
html_text() %>%
gsub('\\h+',' ',.,perl = TRUE) %>%
cat
我不确定您希望最终的预期输出如何显示,但这会返回:
#Michael Allen
#Full Name
#6 ft,0 in
#183 cm
#Height
#195 lbs
#89 kg
#Weight
#January 31,1959
#Birthday
#61
#AGE
#San Mateo,California
#Birthplace
#Scottsdale,Arizona
#Residence
#Wife,Cynthia; Christy (12/8/93),Michelle (6/3/97)
#Family
#University of Nevada (1982,Horticulture)
#College
#1984
#Turned Pro
#16,963,593
#Career Earnings
#Paradise Valley,AZ,United States
#City Plays From
请注意,某些播放器的页面上没有个人信息。