置信区间：使用 mob() 树partykit 包和 Logistic() 模型基础知识注意事项插图

问题描述

如何计算 MOB 对象终端节点中模型的 CI？或者，我是否可以提取终端节点的每个模型，以便在我的情况下，它是一个逻辑回归模型并计算每个的 CI？

谢谢

解决方法

基础知识

使用 refit.modelparty()，您可以重新拟合与 modelparty 或 mob() 返回的 glmtree() 对象的节点关联的模型。然后你可以对这些应用常用函数来提取你感兴趣的信息。

注意事项

学习树后的推断不再“诚实”，即在您的情况下，置信区间的覆盖范围可能不会达到名义水平。要获得真实的结果，您必须拆分数据并在一个子样本上学习树，然后在另一个子样本的终端节点中重新拟合模型。
如果您使用 glmtree()，那么每个节点内的模型都不是完整的 glm 对象，以便更有效地拟合树。因此，来自 confint() 方法的配置文件似然置信区间不是开箱即用的。您可以求助于 Wald 置信区间（例如，在 coefci() 中的 lmtest 中可用）或重新拟合模型（类似于注释 1。）。

插图

作为起点，让我们首先考虑一个基于 glmtree() 和 coefci() 的简单示例：

## Pima Indians diabetes data
data("PimaIndiansDiabetes",package = "mlbench")
 
## recursive partitioning of a logistic regression model
library("partykit")
pid_tree <- glmtree(diabetes ~ glucose | pregnant +
  pressure + triceps + insulin + mass + pedigree + age,data = PimaIndiansDiabetes,family = binomial)

## extract fitted models in terminal nodes
pid_glm <- refit.modelparty(pid_tree,node = nodeids(pid_tree,terminal = TRUE))

## compute Wald confidence intervals
library("lmtest")
lapply(pid_glm,coefci)
## $`2`
##                    2.5 %      97.5 %
## (Intercept) -13.36226424 -6.54075507
## glucose       0.03496439  0.08245134
## 
## $`4`
##                   2.5 %      97.5 %
## (Intercept) -8.27393574 -5.13723535
## glucose      0.03466962  0.05900533
## 
## $`5`
##                   2.5 %      97.5 %
## (Intercept) -3.84548740 -1.69642032
## glucose      0.01529946  0.03177217

对于我们可以采取的诚实方法：

## id for sample splitting
n <- nrow(PimaIndiansDiabetes)
id <- sample(1:n,round(n/2))

## estimate tree on learning sample
pid_tree <- glmtree(diabetes ~ glucose | pregnant +
  pressure + triceps + insulin + mass + pedigree + age,data = PimaIndiansDiabetes[id,],family = binomial)

## out-of-sample prediction
pid_new <- PimaIndiansDiabetes[-id,]
pid_new$node <- predict(pid_tree,newdata = pid_new,type = "node")

## fit separate models on each subset,splitted by predicted node
pid_glm <- lapply(
  split(pid_new,pid_new$node),function(d) glm(diabetes ~ glucose,data = d,family = binomial)
)

## obtain profile likelihood confidence intervals
lapply(pid_glm,confint)
## Waiting for profiling to be done...
## Waiting for profiling to be done...
## Waiting for profiling to be done...
## $`2`
##                   2.5 %     97.5 %
## (Intercept) -8.25379802 -4.5031005
## glucose      0.02743191  0.0569824
## 
## $`4`
##                    2.5 %     97.5 %
## (Intercept) -25.32777607 -4.5078931
## glucose       0.02090339  0.1617026
## 
## $`5`
##                   2.5 %      97.5 %
## (Intercept) -6.38422374 -2.66969443
## glucose      0.02222873  0.05060984

在基于线性预测器（如逻辑 GLM）的模型中，也可以拟合代表整棵树的单个模型。您只需要包含所有回归量与节点指示器的交互。要获得相同的参数和置信区间，您需要使用嵌套编码 (/) 而不是默认交互编码 (*)：

pid_glm2 <- glm(diabetes ~ 0 + factor(node)/glucose,data = pid_new,family = binomial)
confint(pid_glm2)
## Waiting for profiling to be done...
##                              2.5 %      97.5 %
## factor(node)2          -8.25379802 -4.50310050
## factor(node)4         -25.32777607 -4.50789313
## factor(node)5          -6.38422332 -2.66969490
## factor(node)2:glucose   0.02743191  0.05698240
## factor(node)4:glucose   0.02090339  0.16170262
## factor(node)5:glucose   0.02222874  0.05060983

party r r

置信区间：使用 mob() 树partykit 包和 Logistic() 模型 基础知识注意事项插图

问题描述

解决方法

基础知识

注意事项

插图

置信区间：使用 mob() 树partykit 包和 Logistic() 模型基础知识注意事项插图