R Web搜集estest表单Submit_form

问题描述

我是r的新手,对html,xml等不是很了解。我正在尝试抓取一个需要下拉菜单输入内容的网站。这是一篇针对国会议员新闻稿使用文本和情感分析的学术论文。不是程序员,所以请保持柔和!

memberUrl = 'https://grijalva.house.gov/press-releases/'
session <- html_session(memberUrl)
forms <- html_form(session)
yearForm <- forms[[4]]
#--- so far so good (I think) -- and i have successfully scraped sites that don't have drop downs
#--- but here is where I get confused and can't find a good tutorial on forms and submit_form
set_values(yearForm,??? ) #----- get stuck on how to use set_values
submit_form( session,yearForm,???) #--- and here

谢谢!吉姆

解决方法

submit_form无效,可能是因为该表单使用JS提交。解决方法如下:

library(rvest)
memberUrl = 'https://grijalva.house.gov/press-releases/'
session <- html_session(memberUrl)

session <- rvest:::request_POST(session,memberUrl,body = list(
                                  getNewsByyear = "2018" #change the value here,'getNewsByyear' is the name of the dropdown list
                                ))

titles <- read_html(session) %>%
  html_nodes("ul > li > h3") %>%
  html_text()