使用 dplyr 滚动逐步回归

问题描述

我想用 dplyrdo()rollapply() 进行滚动逐步回归。我的数据代码如下所示:

    FUND_DATA <- tibble(
  DATE = 1:10,FUND1 = rnorm(10),FUND2 = rnorm(10),FUND3 = rnorm(10),FUND4 = rnorm(10))

这些只是第 1-10 期基金的相同价格定额。对于独立变量,它看起来是一样的:

FACTORS <- tibble(
  DATE = 1:10,x1 = rnorm(10),x2 = rnorm(10),x3 = rnorm(10),x4 = rnorm(10))

现在我将上面的两个小标题合并如下:

REG_DATA <- FUND_DATA %>%
  pivot_longer(contains("FUND"),names_to = "FUND",values_to = "PRICE") %>% arrange(FUND,DATE) %>% left_join(.,FACTORS,by = "DATE") %>%  
  group_by(FUND) %>% mutate(RET = PRICE/lag(PRICE)-1) %>% drop_na()

所以我有一些很长的 tibble 并由基金分组。

  A tibble: 36 x 8
# Groups:   FUND [4]
    DATE FUND    PRICE       x1     x2      x3      x4     RET
   <int> <chr>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
 1     2 FUND1 -1.19   -0.422   -0.872 -0.292  -0.176  -2.04  
 2     3 FUND1 -0.869   1.60     0.247 -0.610   0.170  -0.272 
 3     4 FUND1 -1.60    0.159   -0.757  0.730  -0.154   0.839 
 4     5 FUND1 -1.58   -0.688   -0.718  0.778   0.879  -0.0103
 5     6 FUND1  1.14   -0.00190 -0.956  1.14   -0.953  -1.72  
 6     7 FUND1 -0.452   0.730   -0.344  0.925  -0.593  -1.40  
 7     8 FUND1 -0.809   0.895   -0.987 -0.0791 -0.0133  0.792 
 8     9 FUND1  1.06   -0.503    1.06   1.96    0.362  -2.31  
 9    10 FUND1  0.0358  0.359   -0.370  1.27    0.129  -0.966 
10     2 FUND2 -0.525  -0.422   -0.872 -0.292  -0.176  -0.229 
# ... with 26 more rows

在此数据上,我想对每个基金执行滚动逐步回归,并为每个滚动窗口和基金存储 R^2。所以对于每个窗口都应该执行逐步回归。我想出了以下代码

ROLLING <- REG_DATA %>% group_by(FUND) %>% do(R2 = rollapply(.,width = 2,function(x){
  summary(step(lm(RET ~ x1+x2+x3+x4,data = .),direction = "both",trace = 0))$r.squared
  },by.column = FALSE,align = "right"))

代码运行没有错误,但输出是问题所在。这段代码只存储最后一个滚动窗口(周期 8-10)的 R^2 并覆盖我认为的其他窗口,所以它看起来像这样:

FUND1   c(0.675,0.675,...)
FUND2   c(0.447,0.447,...)
FUND3   .....

你们能帮我让代码存储每个窗口的 R^2 吗?

解决方法

我为您的任务提供了一种可能的解决方案,尽管它不使用 do() 或 step()。方法是将各个列表项中的 FUNDS 分开,将其转换为每日时间序列并从那里开始工作:

library(dplyr)
library(tidyr)
library(zoo)
library(purrr)
library(plyr)

# your dummy data
FUND_DATA <- tibble(
  DATE = 1:10,FUND1 = rnorm(10),FUND2 = rnorm(10),FUND3 = rnorm(10),FUND4 = rnorm(10))
# your dummy data
FACTORS <- tibble(
  DATE = 1:10,x1 = rnorm(10),x2 = rnorm(10),x3 = rnorm(10),x4 = rnorm(10))

# first part of your code (had to split it to use it later for naming)
REG_DATA <- FUND_DATA %>%
  tidyr::pivot_longer(contains("FUND"),names_to = "FUND",values_to = "PRICE") %>%
  dplyr::arrange(FUND,DATE) %>% 
  dplyr::left_join(.,FACTORS,by = "DATE") 

# make it o a list of timeseries
lts <-  REG_DATA %>%  
  # core data of timeseries is a matrix and allows only one data type (we prefer numeric thus cut "FUND" and preserve only the number)
  dplyr::mutate(FUND = as.numeric(substr(FUND,5,5))) %>% 
  group_by(FUND) %>% 
  mutate(RET = PRICE/lag(PRICE)-1) %>% 
  drop_na() %>%
  # split by groups into list items
  dplyr::group_split() %>% 
  # convert each list item to a time series with starting date and length according to each list item 
  purrr::map( ~ xts::xts(.x,order.by  = seq(as.Date("2020-01-01"),as.Date("2020-01-01") + length(.x),by = 1)))

# map the rollapply to the timeseries and extract R² => !!! width should be larger than 2 because you have 4 explanatory variables (6 seems to be the minimum) 
res <- purrr::map(lts,~ rollapply(.x,width = 6,FUN = function(x) 
                  summary(lm(RET ~ x1+x2+x3+x4,data = as.data.frame(x)))$r.squared,by.column = FALSE,align = "right"))

# deconstruct the time series to a data.frame (there might be a better way)
res2 <- purrr::map(res,~ data.frame(TS = zoo::index(.x),R2 = zoo::coredata(.x))) 

# get the unqiue FUND names and assing as list item names (you could use a vector instead)
names(res2) <- unique(REG_DATA$FUND)

# condense the list items to a data.frame using the before assinged names as a row
plyr::ldply(res2)


     .id         TS        R2
1  FUND1 2020-01-01        NA
2  FUND1 2020-01-02        NA
3  FUND1 2020-01-03        NA
4  FUND1 2020-01-04        NA
5  FUND1 2020-01-05        NA
6  FUND1 2020-01-06 0.3556052
7  FUND1 2020-01-07 0.7670353
8  FUND1 2020-01-08 0.9077215
9  FUND1 2020-01-09 0.9758644
10 FUND2 2020-01-01        NA
11 FUND2 2020-01-02        NA
12 FUND2 2020-01-03        NA
13 FUND2 2020-01-04        NA
14 FUND2 2020-01-05        NA
15 FUND2 2020-01-06 0.8021993
16 FUND2 2020-01-07 0.8755639
17 FUND2 2020-01-08 0.8206098
18 FUND2 2020-01-09 0.8296576
19 FUND3 2020-01-01        NA
20 FUND3 2020-01-02        NA
21 FUND3 2020-01-03        NA
22 FUND3 2020-01-04        NA
23 FUND3 2020-01-05        NA
24 FUND3 2020-01-06 0.4545569
25 FUND3 2020-01-07 0.4172101
26 FUND3 2020-01-08 0.3604151
27 FUND3 2020-01-09 0.9877962
28 FUND4 2020-01-01        NA
29 FUND4 2020-01-02        NA
30 FUND4 2020-01-03        NA
31 FUND4 2020-01-04        NA
32 FUND4 2020-01-05        NA
33 FUND4 2020-01-06 0.9541878
34 FUND4 2020-01-07 0.9973588
35 FUND4 2020-01-08 0.9991080
36 FUND4 2020-01-09 0.9965382

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...