问题描述
我是r的新手,对html,xml等不是很了解。我正在尝试抓取一个需要下拉菜单输入内容的网站。这是一篇针对国会议员新闻稿使用文本和情感分析的学术论文。不是程序员,所以请保持柔和!
memberUrl = 'https://grijalva.house.gov/press-releases/'
session <- html_session(memberUrl)
forms <- html_form(session)
yearForm <- forms[[4]]
#--- so far so good (I think) -- and i have successfully scraped sites that don't have drop downs
#--- but here is where I get confused and can't find a good tutorial on forms and submit_form
set_values(yearForm,??? ) #----- get stuck on how to use set_values
submit_form( session,yearForm,???) #--- and here
谢谢!吉姆
解决方法
submit_form
无效,可能是因为该表单使用JS提交。解决方法如下:
library(rvest)
memberUrl = 'https://grijalva.house.gov/press-releases/'
session <- html_session(memberUrl)
session <- rvest:::request_POST(session,memberUrl,body = list(
getNewsByyear = "2018" #change the value here,'getNewsByyear' is the name of the dropdown list
))
titles <- read_html(session) %>%
html_nodes("ul > li > h3") %>%
html_text()