滚动平均值,dbplyr 中的标准偏差

问题描述

我想在 dbplyr 中设置一个带有滚动函数(滚动均值、stdev...等)的新变量

这是一个数据库

library(odbc)
library(DBI)
library(tidyverse)
library(zoo)

con <- DBI::dbConnect(odbc::odbc(),Driver    = "sql Server",Server    = "xx.xxx.xxx.xxx",Database  = "stock",UID       = "userid",PWD       = "userpassword")

startday = 20150101
day = tbl(con,in_schema("dbo","LogDay")) 

enter image description here

我想计算 5 天的滚动平均值, 这是我的代码,但它不起作用

我该如何解决这个问题?

library(zoo)    
day %>% 
      mutate(ma5 = rollmean(priceClose,k = 5,fill = NA))

error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC sql Server Driver][sql Server]키워드 'AS' 근처의 구문이  [Microsoft][ODBC sql Server Driver][sql Server]문을 준비할 수 
    <sql> 'SELECT TOP 11 "logNo","stockCode","logDate","priceOpen","priceHigh","priceLow","priceClose","adjRate","volume","amount","numListed","remark","marketCap","foreignRate","personNetbuy","foreignNetbuy","instNetbuy","financeNetbuy","insuranceNetbuy","toosinNetbuy","bankNetbuy","gitaFinanceNetbuy","pensionNetbuy","gitaInstNetbuy","gitaForeignNetbuy","samoNetbuy","nationNetbuy",rollmean("priceClose",5.0 AS "k",NULL AS "fill") AS "ma5"
    FROM "dbo"."LogDay"
    WHERE ("logDate" > 20150101.0)
    ORDER BY "stockCode"'
    Warning : 
    Named arguments ignored for sql rollmean

解决方法

发生错误是因为 rollmean 没有定义 dbplyr 转换,也不是无需转换即可使用的 SQL 命令。这并不奇怪,因为 rollmean 是 data.table 库的一部分,而 dbplyr 专注于翻译 dplyr 和基本 R 命令。

您所追求的一部分是窗口函数。 dplyr 的范围为 window functions,SQL 也是如此,但这些之间的转换并不总是 straightforward。但是有一些方法可以使用定义了翻译的命令来做到这一点。

需要考虑的两种可能方法:

(1) 结合滞后和领先

df %>%
  mutate(prev2_price = lag(priceClose,2,order_by = date),prev1_price = lag(priceClose,1,next1_price = lead(priceClose,next2_price = lead(priceClose,order_by = date)) %>%
  mutate(ma5 = (prev2_price + prev1_price + priceClose + next1_price + next2_price) / 5)

这种方法不能很好地扩展,但它很简单且易于推理。如果您想在组内工作(例如,为每只股票单独移动平均线)在使用 group_bylag 之前应用 lead

(2) 加入并过滤掉不需要的记录

df2 = df %>%
  select(stockCode,date,priceClose)

df %>%
  inner_join(df2,by = "stockCode",suffix = c("","_2") %>%
  filter(abs(date - date_2) <= 2) %>% # two records either side = window of width 5
  group_by(stockCode,priceClose) %>%
  summarise(ma5 = mean(priceClose_2)

这种方法更通用,但可能更难推理。

,
day = tbl(con,in_schema("dbo","LogDay")) %>% filter(logDate > startday) %>% lazy_dt()

dayt = day %>% 
   group_by(stockCode) %>% 
   arrange(logDate) %>% 
   mutate(rise = (priceClose/lag(priceClose,1)-1)*100,candle = ifelse(priceClose > priceOpen,0),middle = ifelse(priceClose > (priceHigh + priceLow)/2,ma5 = rollmean(priceClose,k = 5,fill = NA,align = 'right'),ovnprofit = lead(priceOpen,1)/priceClose,disparity = priceClose/ma5*100)