拆分面板回归在R中获得一个唯一结果

问题描述

我正在使用大型数据库，但是为了说明起见，我使用Grunfeld数据

我的目标是将数据分成多个部分，以便我的模型可以运行，否则我的内存不足（需要90,000 gb）。我将splm用于数据，但是由于它可以与plm一起使用，因此我将后者用作示例。一旦我设法运行每个块，我都会希望得到一个总体结果。

到目前为止，我是这样的：

data("Grunfeld",package="plm")
Grunfeld <- pdata.frame(Grunfeld,index = c("firm","year"))
s1<-split(Grunfeld,sample(rep(1:4)))
fm <- value ~ capital
fix <- lapply(1:length(s),function(x) plm(fm,data=s1[[x]],model = "within"))

现在我有一个系数和残差的列表fix

有没有一种方法可以创建一个函数，使我的结果模拟完整数据库的解决方案，而不是4个块？

即

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-1299.602   -88.290   -10.197    84.142  1324.118 

Coefficients:
        Estimate Std. Error t-value  Pr(>|t|)    
capital 0.551055   0.098634  5.5869 7.971e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    23078000
Residual Sum of Squares: 19807000
R-Squared:      0.14174
Adj. R-Squared: 0.09633
F-statistic: 31.213 on 1 and 189 DF,p-value: 7.9714e-08

解决方法

考虑根据数据帧的nrow进行拆分。下面根据数据帧大小将数据分为四个块。

num <- ceiling(nrow(Grunfeld) / 4)
chunks <- ceiling(1:nrow(Grunfeld) / num)
fm <- value ~ capital

df_list <- split(Grunfeld,chunks)
fix <- lapply(df_list,function(df) plm(fm,data=df,model = "within"))

split + lapply的替代项是by：

num <- ceiling(nrow(Grunfeld) / 4)
chunks <- ceiling(1:nrow(Grunfeld) / num)
fm <- value ~ capital

fix <- by(Grunfeld,chunks,model = "within"))

dataframe function function panel-data r split

拆分面板回归在R中获得一个唯一结果

问题描述

解决方法

相关问答