如何计算纵向/面板数据的非线性二进制固定效应 Logit？

问题描述

我正在尝试根据儿童学校愿望的滞后变量来估计儿童工作。
我正在决定是否应该使用 glm 或 clogit 来运行我的模型（需要固定效应 logits）。当我运行 glm 时，我的系数与我的 clogit 非常不同。

model1 <- glm(chldwork~lag_aspgrade_binned+age+as.factor(childid),data=finaletdtlag,family='binomial')

  GLM Output:
Call:
glm(formula = chldwork ~ lag_aspgrade_binned + age + as.factor(childid),family = "binomial",data = finaletdtlag)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-3.02350   0.00001   0.00002   0.17344   2.13769  

Coefficients:
                                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                                     3.037e+01  1.933e+04   0.002   0.9987    
lag_aspgrade_binneddid not complete elementary  2.339e+00  1.083e+00   2.161   0.0307 *  
lag_aspgrade_binnedhs                           1.252e+00  6.082e-01   2.059   0.0395 *  
lag_aspgrade_binnedprimary some hs              1.206e+00  6.739e-01   1.789   0.0735 .  
lag_aspgrade_binnedsome college                 2.081e+00  4.800e-01   4.335 1.46e-05 ***
age                                            -6.123e-01  3.995e-02 -15.326  < 2e-16 ***

此外，当我运行我的 clogit 时，我的输出中没有截获（如本示例所示：https://data.princeton.edu/wws509/r/fixedRandom3）。

我的 clogit 输出：

 > modela <- clogit(chldwork~lag_aspgrade_binned+age+strata(childid),method = 'exact')
> summary(modela)
Call:
coxph(formula = Surv(rep(1,2770L),chldwork) ~ lag_aspgrade_binned + 
    age + strata(childid),data = finaletdtlag,method = "exact")

  n= 2770,number of events= 2358 

                                                   coef exp(coef) se(coef)       z Pr(>|z|)    
lag_aspgrade_binneddid not complete elementary  1.09351   2.98473  0.83332   1.312  0.18944    
lag_aspgrade_binnedhs                           0.53032   1.69948  0.45095   1.176  0.23959    
lag_aspgrade_binnedprimary some hs              0.49815   1.64567  0.50075   0.995  0.31983    
lag_aspgrade_binnedsome college                 1.00269   2.72560  0.34619   2.896  0.00377 ** 
age                                            -0.36846   0.69180  0.02905 -12.684  < 2e-16 ***

我的代码有错误吗？你们都更喜欢一种吗？

解决方法

使用最小二乘虚拟变量 (LSDV) 方法计算非线性（二元）固定效应 (FE) 模型，就像使用 glm 一样，尚未考虑附带参数问题 (IPP)，并且因此估计量是有偏差的。基本上，IPP 发生在观察的数量以及个体虚拟变量的数量与观察到的时间段相比较大时。您可以在 this answer on Cross Validated 中找到 IPP 的统计解释。

survival::clogit 以及与 bife::bife 结合使用的 bife::bias_corr 都考虑了 IPP。

让我们尝试使用 glm 计算二元有限元模型（请参阅答案底部的“数据”）。

s <- summary(g.fit <- glm(y ~ 0 + x1 + x2 + factor(id),data=idc,family='binomial'))$coe
s[-grep("factor",rownames(s)),]
#      Estimate Std. Error  z value   Pr(>|z|)
# x1 0.80750181 0.32069729 2.517956 0.01180379
# x2 0.08353417 0.05040558 1.657240 0.09747086

使用 bife::bife 计算模型最初给出了相同的结果。

summary(b.fit <- bife::bife(y ~ x1 + x2 | id,data=idc))$cm
#      Estimate Std. error  z value  Pr(> |z|)
# x1 0.80750173 0.32070309 2.517911 0.01180533
# x2 0.08353417 0.05040645 1.657212 0.09747662

然而，结果有偏差，当我们校正 IPP 时，我们得到的估计值更小：

summary(b.fit.corr <- bias_corr(b.fit))$cm
#      Estimate Std. error  z value  Pr(> |z|)
# x1 0.65672233  0.3192673 2.056967 0.03968939
# x2 0.06672389  0.0498996 1.337163 0.18116946

clogit 现在已经将 IPP 考虑在内。

summary(survival::clogit(y ~ x1 + x2 + strata(id),data=idc))$coe
#         coef exp(coef)  se(coef)        z   Pr(>|z|)
# x1 0.6533630  1.921994 0.2875215 2.272397 0.02306255
# x2 0.0659169  1.068138 0.0449555 1.466270 0.14257474

请注意，bife::bife 更保守地计算标准误差。但是，使用 Stata 我们得到与使用 survival::clogit:

相同的结果

. use http://www.stata-press.com/data/r13/clogitid,clear

. clogit y x1 x2,group(id) noheader
note: multiple positive outcomes within groups encountered.

Iteration 0:   log likelihood = -123.42828  
Iteration 1:   log likelihood = -123.41386  
Iteration 2:   log likelihood = -123.41386  
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |    .653363   .2875215     2.27   0.023     .0898312    1.216895
          x2 |   .0659169   .0449555     1.47   0.143    -.0221943    .1540281
------------------------------------------------------------------------------

因此，要回答您的问题，您的 clogit 方法的结果应该是首选。

您的“失踪”拦截问题解释如下。在线性回归模型中，我们有 one 总体截距 a,

而在固定效果中，我们有许多个单独的拦截a_i。

bife::bife 将固定效果存储在名为 alpha 的对象中：

b.fit.corr$alpha
#         1014         1017         1019         1023         1025         1027 
# -1.192037100 -0.740369821 -0.093573868 -0.740850707  0.374930145 -1.179830711 
#         1029         1030         1031         1032         1039         1043 
# -0.702326346 -0.460228413 -1.441325728 -1.286722837 -0.117864675 -2.140867994 
#         1044         1045         1047         1050         1055         1060 
# -2.114299987  0.645971508 -0.436457378 -1.165316816 -1.657762052 -0.720114822 
#         1069         1073         1074         1075         1076         1077 
# -1.637936117 -0.782373571 -1.395162657 -1.395167427 -1.637684316 -1.696587849 
#         1078         1092         1094         1095         1097         1098 
# -1.106264431 -0.127039684 -1.439563580 -1.310283872 -0.778302356 -1.982045758 
#         1099         1104         1105         1108         1110         1113 
# -0.005352088 -0.006881257 -1.802152970 -1.165296728 -0.314567361 -1.817725352 
#         1118         1120         1121         1122         1125         1126 
# -1.110847553 -2.128173826 -1.803025930 -1.164773956 -1.107030040 -2.251146286 
#         1128         1133         1137         1141         1142         1144 
# -0.940981858 -1.416409893 -1.441811848 -0.330724832 -1.657610560 -1.136508411 
#         1146         1147         1148         1149         1150         1154 
# -1.286055926  0.196135238 -1.107793309 -1.637240256 -2.226747493 -0.701304388 
#         1155         1156         1157         1163         1169         1172 
# -1.658472805 -1.654763658 -1.134978654 -2.024766764 -1.440093115 -0.940165139 
#         1176         1181         1186         1187         1191         1195 
# -0.481589127 -2.114877897 -1.137394808 -0.006881257 -1.636654053 -0.027152409

它们对应于 glm 模型中的各个假人，

g.fit$coefficients[grep("factor",names(g.fit$coe))]

然而——如上所述——由于 IPP，这些是有偏见的。

注意：我找不到使用 survival::clogit 提取 FE 的方法，如果有人知道如何执行此操作，请在评论中告诉我！

数据：

idc <- readstata13::read.dta13("http://www.stata-press.com/data/r11/clogitid.dta")

glm logistic-regression longitudinal r

如何计算纵向/面板数据的非线性二进制固定效应 Logit？

问题描述

解决方法

数据：

相关问答