在R中使用动态变量循环

问题描述

我已经为服务建立了API查询,并且我想创建一个循环,循环遍历建立多个最终数据帧的日期。到目前为止,我拥有的代码如下:

query1 <- "search publications in full_data for \"\\\"Education\\\"\" 
where type in [ \"article\" ] 
and (category_for.name ~\"Education\") 
and date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"
return publications[type + all]"

x1 <- dsApiRequest(token = token,query = query)
m1 <- dsApi2df(D)

我想要做的是将日期从2个月增加2个月,从query1x1m1增加querynxnmn。完整地写了一下,对于前两遍,看起来像这样:

query1 <- "search publications in full_data for \"\\\"Education\\\"\" 
where type in [ \"article\" ] 
and (category_for.name ~\"Education\") 
and date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"
return publications[type + all]"

Y1 <- dsApiRequest(token = token,query = query)
N1 <- dsApi2df(D)

THEN

query2 <- "search publications in full_data for \"\\\"Education\\\"\" 
where type in [ \"article\" ] 
and (category_for.name ~\"Education\") 
and date_inserted >= \"2019-03\" and date_inserted < \"2019-04\"
return publications[type + all]"

Y2 <- dsApiRequest(token = token,query = query)
N2 <- dsApi2df(D)

请注意,日期也必须随每次通过而更改。

解决方法

尽管sprintf软件包是新软件包,并且具有不错的界面,但我还是喜欢使用glue这个基本命令。使用sprintf,您可以将%s用作字符串内的占位符,然后可以使用其他参数替换为值。

我已“简化”您的查询,以关注变化的日期。

query  <- "blah blah
and date_inserted >= \"%s\" and date_inserted < \"%s\"
return blah blah"

library(lubridate)
start_dates = seq(as.Date("2019-01-01"),as.Date("2020-09-01"),by = "2 months")
end_dates = start_dates + months(1) # lubridate is only used here for this nice months() function

query_vec = sprintf(query,format(start_dates,"%Y-%m"),format(end_dates,"%Y-%m"))
query_vec
# [1] "blah blah\nand date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"\nreturn blah blah"
# [2] "blah blah\nand date_inserted >= \"2019-03\" and date_inserted < \"2019-04\"\nreturn blah blah"
# [3] "blah blah\nand date_inserted >= \"2019-05\" and date_inserted < \"2019-06\"\nreturn blah blah"
# ...

使用glue,您可以将变量名放在字符串的{braces}中,并在glue()时自动将其填充。 (有些令人困惑,结果打印时不带引号,但它仍然是字符向量,仍然可以正常工作。)(使用与上面相同的start_datesend_dates。)

library(glue)
glue_query = "blah blah
and date_inserted >= \"{start_dates}\" and date_inserted < \"{end_dates}\"
return blah blah"

query_vec = glue(glue_query)
query_vec
# blah blah
# and date_inserted >= "2019-01-01" and date_inserted < "2019-02-01"
# return blah blah
# blah blah
# and date_inserted >= "2019-03-01" and date_inserted < "2019-04-01"
# return blah blah
# ...