问题描述
我的代码如下:
get_postcoefs <- function(portfo){
my_dat <- prerank_betas %>%
filter(portfo == portfo) %>%
lm(ret ~ ewr,my_dat) %>%
coef %>%
as.list %>%
as_data_frame
}
postrank <- prerank_betas %>%
group_by(portfo) %>%
do(get_postcoefs(.$portfo))
我使用的数据框如下所示:
dput(head(prerank_betas,10))
structure(list(permco = c(3,4,5,6,7,8,9,11,12,13),pre_beta = c(0.754759259550561,0.631020855428056,0.963497668377108,1.42359914669436,1.88321141160762,0.137054776055511,1.04141132820461,0.170163365604386,1.07633721793778,1.05016503010496),ret = c(0.021630734879652,0.00867405735757635,0.0157192335910029,0.0163030885650139,0.017402600558639,0.0182427638210356,0.015755719798324,0.0348026989282579,0.0120230854319578,0.016944221076395
),me = c(12.3938081896552,603.599033139535,36.6372490671642,20.481490497076,2918.12852836134,1.89075555555556,1.21730113636364,5.5216014957265,116.021340472028,8.22907327586207),ewr = c(0.454914743929347,0.65175605642766,1.04015768854358,1.54966348955938,1.46542203513179,0.874404877119168,0.934768449855933,0.296266764535612,0.949971716508229,1.31022003302531),beta_rank = c(3L,3L,5L,8L,10L,1L,6L,6L),portfo = c(4L,2L,3L)),row.names = c(NA,-10L),class = c("tbl_df","tbl","data.frame"
))
一点点:10 x 7
permco pre_beta ret me ewr beta_rank portfo
<dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 3 0.755 0.0216 12.4 0.455 3 4
2 4 0.631 0.00867 604. 0.652 3 10
3 5 0.963 0.0157 36.6 1.04 5 6
4 6 1.42 0.0163 20.5 1.55 8 5
5 7 1.88 0.0174 2918. 1.47 10 10
6 8 0.137 0.0182 1.89 0.874 1 1
7 9 1.04 0.0158 1.22 0.935 6 1
8 11 0.170 0.0348 5.52 0.296 1 2
9 12 1.08 0.0120 116. 0.950 6 8
10 13 1.05 0.0169 8.23 1.31 6 3
我收到以下错误消息:
Error in as.data.frame.default(data) :
cannot coerce class ‘"formula"’ to a data.frame
我必须如何调整我的代码才能正常工作。
解决方法
从 OP 代码看来,问题是尝试按投资组合运行 lm()
并为所有投资组合回归创建系数的输出数据框。
如评论中所述,原帖中的代码失败,因为当 R 尝试处理表达式 filter()
时,portfo = portfo
函数包含 quasiquotation 冲突。
缺少一个最小的可重复示例,这里有一种使用 purrr::map()
和 broom::tidy()
在 mtcars
数据框上运行线性模型的方法。
由于我们按 mtcars$cyl
拆分数据,因此不需要 OP 中使用的 filter()
函数。
library(dplyr)
library(purrr)
library(broom)
mtcars %>%
split(.$cyl) %>%
purrr::map(.,function(x){
lm(mpg ~ wt,data = x) %>%
tidy(.)
}) -> results
# combine into a data frame
df <- as.data.frame(do.call(rbind,results))
# extract cyl from rownames
df$cyl <- substr(rownames(df),1,1)
...和输出:
term estimate std.error statistic p.value cyl
4.1 (Intercept) 39.571196 4.3465820 9.103980 7.771511e-06 4
4.2 wt -5.647025 1.8501185 -3.052251 1.374278e-02 4
6.1 (Intercept) 28.408845 4.1843688 6.789278 1.054844e-03 6
6.2 wt -2.780106 1.3349173 -2.082605 9.175766e-02 6
8.1 (Intercept) 23.868029 3.0054619 7.941551 4.052705e-06 8
8.2 wt -2.192438 0.7392393 -2.965803 1.179281e-02 8
>
原始海报数据的解决方案
将最近发布的数据修改为每个 portfo
值至少有 5 个观察值后,处理股票数据的解决方案如下所示。
textData <- "id permco pre_beta ret me ewr beta_rank portfo
1 3 0.755 0.0216 12.4 0.455 3 1
2 4 0.631 0.00867 604. 0.652 3 1
3 5 0.963 0.0157 36.6 1.04 5 1
4 6 1.42 0.0163 20.5 1.55 8 1
5 7 1.88 0.0174 2918. 1.47 10 1
6 8 0.137 0.0182 1.89 0.874 1 1
7 3 0.755 0.0216 12.4 0.455 3 2
8 4 0.631 0.00867 604. 0.652 3 2
9 5 0.963 0.0157 36.6 1.04 5 2
10 6 1.42 0.0163 20.5 1.55 8 2
11 7 1.88 0.0174 2918. 1.47 10 2
12 8 0.137 0.0182 1.89 0.874 1 2"
注意:通过复制值和调整 portfo
标识符,我们可以演示解决方案,知道生成的两个模型将具有完全相同的系数,因为它们具有相同的输入数据.
prerank_betas <- read.table(text=textData,header=TRUE)
library(dplyr)
library(purrr)
library(broom)
prerank_betas %>%
split(.$portfo) %>%
purrr::map(.,function(x){
lm(ret ~ ewr,data = x) %>%
tidy(.)
}) -> results
# combine into a data frame
df <- as.data.frame(do.call(rbind,results))
df$portfo <- as.numeric(gsub(".$","",rownames(df)))
df
...和输出:
term estimate std.error statistic p.value portfo
1.1 (Intercept) 1.629081e-02 0.005291043 3.078941068 0.03696967 1
1.2 ewr 2.071675e-05 0.004884269 0.004241526 0.99681887 1
2.1 (Intercept) 1.629081e-02 0.005291043 3.078941068 0.03696967 2
2.2 ewr 2.071675e-05 0.004884269 0.004241526 0.99681887 2