问题描述
我想用 dplyr、do() 和 rollapply() 进行滚动逐步回归。我的数据代码如下所示:
FUND_DATA <- tibble(
DATE = 1:10,FUND1 = rnorm(10),FUND2 = rnorm(10),FUND3 = rnorm(10),FUND4 = rnorm(10))
这些只是第 1-10 期基金的相同价格定额。对于独立变量,它看起来是一样的:
FACTORS <- tibble(
DATE = 1:10,x1 = rnorm(10),x2 = rnorm(10),x3 = rnorm(10),x4 = rnorm(10))
现在我将上面的两个小标题合并如下:
REG_DATA <- FUND_DATA %>%
pivot_longer(contains("FUND"),names_to = "FUND",values_to = "PRICE") %>% arrange(FUND,DATE) %>% left_join(.,FACTORS,by = "DATE") %>%
group_by(FUND) %>% mutate(RET = PRICE/lag(PRICE)-1) %>% drop_na()
所以我有一些很长的 tibble 并由基金分组。
A tibble: 36 x 8
# Groups: FUND [4]
DATE FUND PRICE x1 x2 x3 x4 RET
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 FUND1 -1.19 -0.422 -0.872 -0.292 -0.176 -2.04
2 3 FUND1 -0.869 1.60 0.247 -0.610 0.170 -0.272
3 4 FUND1 -1.60 0.159 -0.757 0.730 -0.154 0.839
4 5 FUND1 -1.58 -0.688 -0.718 0.778 0.879 -0.0103
5 6 FUND1 1.14 -0.00190 -0.956 1.14 -0.953 -1.72
6 7 FUND1 -0.452 0.730 -0.344 0.925 -0.593 -1.40
7 8 FUND1 -0.809 0.895 -0.987 -0.0791 -0.0133 0.792
8 9 FUND1 1.06 -0.503 1.06 1.96 0.362 -2.31
9 10 FUND1 0.0358 0.359 -0.370 1.27 0.129 -0.966
10 2 FUND2 -0.525 -0.422 -0.872 -0.292 -0.176 -0.229
# ... with 26 more rows
在此数据上,我想对每个基金执行滚动逐步回归,并为每个滚动窗口和基金存储 R^2。所以对于每个窗口都应该执行逐步回归。我想出了以下代码:
ROLLING <- REG_DATA %>% group_by(FUND) %>% do(R2 = rollapply(.,width = 2,function(x){
summary(step(lm(RET ~ x1+x2+x3+x4,data = .),direction = "both",trace = 0))$r.squared
},by.column = FALSE,align = "right"))
代码运行没有错误,但输出是问题所在。这段代码只存储最后一个滚动窗口(周期 8-10)的 R^2 并覆盖我认为的其他窗口,所以它看起来像这样:
FUND1 c(0.675,0.675,...)
FUND2 c(0.447,0.447,...)
FUND3 .....
你们能帮我让代码存储每个窗口的 R^2 吗?
解决方法
我为您的任务提供了一种可能的解决方案,尽管它不使用 do() 或 step()。方法是将各个列表项中的 FUNDS 分开,将其转换为每日时间序列并从那里开始工作:
library(dplyr)
library(tidyr)
library(zoo)
library(purrr)
library(plyr)
# your dummy data
FUND_DATA <- tibble(
DATE = 1:10,FUND1 = rnorm(10),FUND2 = rnorm(10),FUND3 = rnorm(10),FUND4 = rnorm(10))
# your dummy data
FACTORS <- tibble(
DATE = 1:10,x1 = rnorm(10),x2 = rnorm(10),x3 = rnorm(10),x4 = rnorm(10))
# first part of your code (had to split it to use it later for naming)
REG_DATA <- FUND_DATA %>%
tidyr::pivot_longer(contains("FUND"),names_to = "FUND",values_to = "PRICE") %>%
dplyr::arrange(FUND,DATE) %>%
dplyr::left_join(.,FACTORS,by = "DATE")
# make it o a list of timeseries
lts <- REG_DATA %>%
# core data of timeseries is a matrix and allows only one data type (we prefer numeric thus cut "FUND" and preserve only the number)
dplyr::mutate(FUND = as.numeric(substr(FUND,5,5))) %>%
group_by(FUND) %>%
mutate(RET = PRICE/lag(PRICE)-1) %>%
drop_na() %>%
# split by groups into list items
dplyr::group_split() %>%
# convert each list item to a time series with starting date and length according to each list item
purrr::map( ~ xts::xts(.x,order.by = seq(as.Date("2020-01-01"),as.Date("2020-01-01") + length(.x),by = 1)))
# map the rollapply to the timeseries and extract R² => !!! width should be larger than 2 because you have 4 explanatory variables (6 seems to be the minimum)
res <- purrr::map(lts,~ rollapply(.x,width = 6,FUN = function(x)
summary(lm(RET ~ x1+x2+x3+x4,data = as.data.frame(x)))$r.squared,by.column = FALSE,align = "right"))
# deconstruct the time series to a data.frame (there might be a better way)
res2 <- purrr::map(res,~ data.frame(TS = zoo::index(.x),R2 = zoo::coredata(.x)))
# get the unqiue FUND names and assing as list item names (you could use a vector instead)
names(res2) <- unique(REG_DATA$FUND)
# condense the list items to a data.frame using the before assinged names as a row
plyr::ldply(res2)
.id TS R2
1 FUND1 2020-01-01 NA
2 FUND1 2020-01-02 NA
3 FUND1 2020-01-03 NA
4 FUND1 2020-01-04 NA
5 FUND1 2020-01-05 NA
6 FUND1 2020-01-06 0.3556052
7 FUND1 2020-01-07 0.7670353
8 FUND1 2020-01-08 0.9077215
9 FUND1 2020-01-09 0.9758644
10 FUND2 2020-01-01 NA
11 FUND2 2020-01-02 NA
12 FUND2 2020-01-03 NA
13 FUND2 2020-01-04 NA
14 FUND2 2020-01-05 NA
15 FUND2 2020-01-06 0.8021993
16 FUND2 2020-01-07 0.8755639
17 FUND2 2020-01-08 0.8206098
18 FUND2 2020-01-09 0.8296576
19 FUND3 2020-01-01 NA
20 FUND3 2020-01-02 NA
21 FUND3 2020-01-03 NA
22 FUND3 2020-01-04 NA
23 FUND3 2020-01-05 NA
24 FUND3 2020-01-06 0.4545569
25 FUND3 2020-01-07 0.4172101
26 FUND3 2020-01-08 0.3604151
27 FUND3 2020-01-09 0.9877962
28 FUND4 2020-01-01 NA
29 FUND4 2020-01-02 NA
30 FUND4 2020-01-03 NA
31 FUND4 2020-01-04 NA
32 FUND4 2020-01-05 NA
33 FUND4 2020-01-06 0.9541878
34 FUND4 2020-01-07 0.9973588
35 FUND4 2020-01-08 0.9991080
36 FUND4 2020-01-09 0.9965382