问题描述
在使用 glm 中的权重参数执行聚合回归时,我可以添加分类预测变量以将结果与单个数据的回归匹配(忽略 df 中的差异),但是当我添加连续预测变量时,结果不再匹配。
例如
summary(glm(am ~ as.factor(cyl) + carb,data = mtcars,family = binomial(link = "logit")))
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb,family = binomial(link = "logit"),## data = mtcars)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8699 -0.5506 -0.1869 0.6185 1.9806
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.6718 1.0924 -0.615 0.53854
## as.factor(cyl)6 -3.7609 1.9072 -1.972 0.04862 *
## as.factor(cyl)8 -5.5958 1.9381 -2.887 0.00389 **
## carb 1.1144 0.5918 1.883 0.05967 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 26.287 on 28 degrees of freedom
## AIC: 34.287
##
## Number of Fisher Scoring iterations: 5
以上结果符合以下条件:
mtcars_percent <- mtcars %>%
group_by(cyl,carb) %>%
summarise(
n = n(),am = sum(am)/n
)
summary(glm(am ~ as.factor(cyl) + carb,data = mtcars_percent,weights = n
))
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb,## data = mtcars_percent,weights = n)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## 0.9179 -0.9407 -0.3772 -0.0251 0.4468 -0.3738 -0.5602 0.1789
## 9
## 0.3699
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.6718 1.0925 -0.615 0.53858
## as.factor(cyl)6 -3.7609 1.9074 -1.972 0.04865 *
## as.factor(cyl)8 -5.5958 1.9383 -2.887 0.00389 **
## carb 1.1144 0.5919 1.883 0.05971 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19.6356 on 8 degrees of freedom
## Residual deviance: 2.6925 on 5 degrees of freedom
## AIC: 18.485
##
## Number of Fisher Scoring iterations: 5
上述系数和标准误匹配。
然而,向该实验添加连续预测变量(例如 mpg
)会产生差异。个人数据:
summary(glm(formula = am ~ as.factor(cyl) + carb + mpg,family = binomial,data = mtcars))
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb + mpg,## data = mtcars)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8933 -0.4595 -0.1293 0.1475 1.6969
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -18.3024 9.3442 -1.959 0.0501 .
## as.factor(cyl)6 -1.8594 2.5963 -0.716 0.4739
## as.factor(cyl)8 -0.3029 2.8828 -0.105 0.9163
## carb 1.6959 0.9918 1.710 0.0873 .
## mpg 0.6771 0.3645 1.858 0.0632 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 18.467 on 27 degrees of freedom
## AIC: 28.467
##
## Number of Fisher Scoring iterations: 6
现在聚合:
mtcars_percent <- mtcars %>%
group_by(cyl,am = sum(am)/n,mpg = mean(mpg)
)
# A tibble: 9 x 5
# Groups: cyl [3]
cyl carb n am mpg
<dbl> <dbl> <int> <dbl> <dbl>
1 4 1 5 0.8 27.6
2 4 2 6 0.667 25.9
3 6 1 2 0 19.8
4 6 4 4 0.5 19.8
5 6 6 1 1 19.7
6 8 2 4 0 17.2
7 8 3 3 0 16.3
8 8 4 6 0.167 13.2
9 8 8 1 1 15
glm(formula = am ~ as.factor(cyl) + carb + mpg,weights = n
) %>%
summary()
##
## Call:
## glm(formula = am ~ as.factor(cyl) + carb + mpg,weights = n)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## 0.75845 -0.73755 -0.24505 -0.02649 0.34041 -0.50528 -0.74002 0.46178
## 9
## 0.17387
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.3593 19.9611 -0.569 0.569
## as.factor(cyl)6 -1.7932 3.7491 -0.478 0.632
## as.factor(cyl)8 -1.4419 7.3124 -0.197 0.844
## carb 1.4059 1.0718 1.312 0.190
## mpg 0.3825 0.7014 0.545 0.585
##
## (dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19.6356 on 8 degrees of freedom
## Residual deviance: 2.3423 on 4 degrees of freedom
## AIC: 20.134
##
## Number of Fisher Scoring iterations: 6
系数、标准误差和 p 值现在不同,我想了解为什么以及可以做些什么来匹配单个数据模型?
在 glm()
的帮助部分,它指出“权重可用于表示不同的观测值具有不同的离散度(权重的值与离散度成反比);或者等效地,当权重是正整数 w_i,每个响应 y_i 是 w_i 单位权重观测值的平均值。"
我认为这意味着我可以像我所做的那样计算每个分组因子的均值(mpg),并且回归应该工作。显然我误解了一些东西......
感谢您的帮助
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)