问题描述
我想在 dbplyr 中设置一个带有滚动函数(滚动均值、stdev...等)的新变量
library(odbc)
library(DBI)
library(tidyverse)
library(zoo)
con <- DBI::dbConnect(odbc::odbc(),Driver = "sql Server",Server = "xx.xxx.xxx.xxx",Database = "stock",UID = "userid",PWD = "userpassword")
startday = 20150101
day = tbl(con,in_schema("dbo","LogDay"))
我想计算 5 天的滚动平均值, 这是我的代码,但它不起作用
我该如何解决这个问题?
library(zoo)
day %>%
mutate(ma5 = rollmean(priceClose,k = 5,fill = NA))
error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC sql Server Driver][sql Server]키워드 'AS' 근처의 구문이 [Microsoft][ODBC sql Server Driver][sql Server]문을 준비할 수
<sql> 'SELECT TOP 11 "logNo","stockCode","logDate","priceOpen","priceHigh","priceLow","priceClose","adjRate","volume","amount","numListed","remark","marketCap","foreignRate","personNetbuy","foreignNetbuy","instNetbuy","financeNetbuy","insuranceNetbuy","toosinNetbuy","bankNetbuy","gitaFinanceNetbuy","pensionNetbuy","gitaInstNetbuy","gitaForeignNetbuy","samoNetbuy","nationNetbuy",rollmean("priceClose",5.0 AS "k",NULL AS "fill") AS "ma5"
FROM "dbo"."LogDay"
WHERE ("logDate" > 20150101.0)
ORDER BY "stockCode"'
Warning :
Named arguments ignored for sql rollmean
解决方法
发生错误是因为 rollmean
没有定义 dbplyr 转换,也不是无需转换即可使用的 SQL 命令。这并不奇怪,因为 rollmean
是 data.table 库的一部分,而 dbplyr 专注于翻译 dplyr 和基本 R 命令。
您所追求的一部分是窗口函数。 dplyr 的范围为 window functions,SQL 也是如此,但这些之间的转换并不总是 straightforward。但是有一些方法可以使用定义了翻译的命令来做到这一点。
需要考虑的两种可能方法:
(1) 结合滞后和领先
df %>%
mutate(prev2_price = lag(priceClose,2,order_by = date),prev1_price = lag(priceClose,1,next1_price = lead(priceClose,next2_price = lead(priceClose,order_by = date)) %>%
mutate(ma5 = (prev2_price + prev1_price + priceClose + next1_price + next2_price) / 5)
这种方法不能很好地扩展,但它很简单且易于推理。如果您想在组内工作(例如,为每只股票单独移动平均线)在使用 group_by
和 lag
之前应用 lead
。
(2) 加入并过滤掉不需要的记录
df2 = df %>%
select(stockCode,date,priceClose)
df %>%
inner_join(df2,by = "stockCode",suffix = c("","_2") %>%
filter(abs(date - date_2) <= 2) %>% # two records either side = window of width 5
group_by(stockCode,priceClose) %>%
summarise(ma5 = mean(priceClose_2)
这种方法更通用,但可能更难推理。
,day = tbl(con,in_schema("dbo","LogDay")) %>% filter(logDate > startday) %>% lazy_dt()
dayt = day %>%
group_by(stockCode) %>%
arrange(logDate) %>%
mutate(rise = (priceClose/lag(priceClose,1)-1)*100,candle = ifelse(priceClose > priceOpen,0),middle = ifelse(priceClose > (priceHigh + priceLow)/2,ma5 = rollmean(priceClose,k = 5,fill = NA,align = 'right'),ovnprofit = lead(priceOpen,1)/priceClose,disparity = priceClose/ma5*100)