问题描述
我已经为服务建立了API查询,并且我想创建一个循环,循环遍历建立多个最终数据帧的日期。到目前为止,我拥有的代码如下:
query1 <- "search publications in full_data for \"\\\"Education\\\"\"
where type in [ \"article\" ]
and (category_for.name ~\"Education\")
and date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"
return publications[type + all]"
x1 <- dsApiRequest(token = token,query = query)
m1 <- dsApi2df(D)
我想要做的是将日期从2个月增加2个月,从query1
,x1
和m1
增加到queryn
,xn
和mn
。完整地写了一下,对于前两遍,看起来像这样:
query1 <- "search publications in full_data for \"\\\"Education\\\"\"
where type in [ \"article\" ]
and (category_for.name ~\"Education\")
and date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"
return publications[type + all]"
Y1 <- dsApiRequest(token = token,query = query)
N1 <- dsApi2df(D)
THEN
query2 <- "search publications in full_data for \"\\\"Education\\\"\"
where type in [ \"article\" ]
and (category_for.name ~\"Education\")
and date_inserted >= \"2019-03\" and date_inserted < \"2019-04\"
return publications[type + all]"
Y2 <- dsApiRequest(token = token,query = query)
N2 <- dsApi2df(D)
请注意,日期也必须随每次通过而更改。
解决方法
尽管sprintf
软件包是新软件包,并且具有不错的界面,但我还是喜欢使用glue
这个基本命令。使用sprintf
,您可以将%s
用作字符串内的占位符,然后可以使用其他参数替换为值。
我已“简化”您的查询,以关注变化的日期。
query <- "blah blah
and date_inserted >= \"%s\" and date_inserted < \"%s\"
return blah blah"
library(lubridate)
start_dates = seq(as.Date("2019-01-01"),as.Date("2020-09-01"),by = "2 months")
end_dates = start_dates + months(1) # lubridate is only used here for this nice months() function
query_vec = sprintf(query,format(start_dates,"%Y-%m"),format(end_dates,"%Y-%m"))
query_vec
# [1] "blah blah\nand date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"\nreturn blah blah"
# [2] "blah blah\nand date_inserted >= \"2019-03\" and date_inserted < \"2019-04\"\nreturn blah blah"
# [3] "blah blah\nand date_inserted >= \"2019-05\" and date_inserted < \"2019-06\"\nreturn blah blah"
# ...
使用glue
,您可以将变量名放在字符串的{braces}
中,并在glue()
时自动将其填充。 (有些令人困惑,结果打印时不带引号,但它仍然是字符向量,仍然可以正常工作。)(使用与上面相同的start_dates
和end_dates
。)
library(glue)
glue_query = "blah blah
and date_inserted >= \"{start_dates}\" and date_inserted < \"{end_dates}\"
return blah blah"
query_vec = glue(glue_query)
query_vec
# blah blah
# and date_inserted >= "2019-01-01" and date_inserted < "2019-02-01"
# return blah blah
# blah blah
# and date_inserted >= "2019-03-01" and date_inserted < "2019-04-01"
# return blah blah
# ...