使用R模仿单击“下载数据集”并将文件保存在其他文件夹中

问题描述

我希望有人能够帮助我找出如何抓取没有链接的.csv文件。

点击R中的“下载”按钮

我想让R下载在此网站https://www.opentable.com/state-of-industry上第一个表格旁边单击“下载数据集”时生成的.csv文件。我发现与问题最接近的帖子是this，但找不到该解决方案中使用的API链接。

第二个潜在问题：将下载的文件保存到另一个位置

理想情况下，我希望将文件加载到R中（类似于上面链接中的解决方案所做的事情），但是如果唯一的方法是将其下载到设备上，然后在R中读取，那么我会例如，将.csv文件安装在特定文件夹（例如C：\ Documents \ OpenTable）中，并覆盖具有相同名称的现有文件。

谢谢！

解决方法

这是因为此页面未调用任何API，因此CSV文件中的所有数据都在该页面的JS代码中。您可以在包含<script>的{{1}}标记中找到它。要将JS中创建的数据转换为R中的数据，您需要covidDataCenter包。然后，对数据进行一些转换：

V8

通过library(rvest) library(V8) library(dplyr) library(tidyr) pg <- read_html("https://www.opentable.com/state-of-industry") js <- pg %>% html_node(xpath = "//script[contains(.,'covidDataCenter')]") %>% html_text() ct <- V8::new_context() ct$eval("var window = {}") # the JS code creates a `window` object that we need to initialize first ct$eval(js) data <- ct$get("window")$`__INITIAL_STATE__`$covidDataCenter$fullbook # this is where the data sets get values dates <- data$headers countries <- data$countries states <- data$states cities <- data$cities # ALthough it's not straight-forward but you can achieve the datasets you want by this: countries_df <- countries %>% unnest(yoy) %>% group_by(name,id,size) %>% mutate( date = dates ) %>% ungroup() %>% spread(date,yoy) %>% .[c("name","id","size",dates)] # arrange the columns # similar to states and cities将数据框导出到CSV文件。

httr post r r web-scraping