向 geom_smooth 和 stat_fit_tidy 中的 glm 公式添加偏移项

问题描述

我有一个 data.frame，在三个 group 中每两个 cluster 的计数我正在拟合逻辑回归 (binomial glm 与logit link function)，并使用 ggplot2 的 geom_bar 和 geom_smooth 绘制所有内容，并使用 ggpmisc 的添加 p 值stat_fit_tidy。

这是它的样子：

数据：

library(dplyr)

observed.probability.df <- data.frame(cluster = c("c1","c1","c2","c3","c3"),group = rep(c("A","B"),3),p = c(0.4,0.6,0.5,0.4))
observed.data.df <- do.call(rbind,lapply(c("c1",function(l){
  do.call(rbind,lapply(c("A",function(g)
    data.frame(cluster = l,group = g,value = c(rep(0,1000*dplyr::filter(observed.probability.df,cluster == l & group != g)$p),rep(1,cluster == l & group == g)$p)))
  ))
}))

observed.probability.df$cluster <- factor(observed.probability.df$cluster,levels = c("c1","c3"))
observed.data.df$cluster <- factor(observed.data.df$cluster,"c3"))
observed.probability.df$group <- factor(observed.probability.df$group,levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group,"B"))

剧情：

library(ggplot2)
library(ggpmisc)

ggplot(observed.probability.df,aes(x = group,y = p,group = cluster,fill = group)) +
  geom_bar(stat = 'identity') +
  geom_smooth(data = observed.data.df,mapping = aes(x = group,y = value,group = cluster),color = "black",method = 'glm',method.args = list(family = binomial(link = 'logit'))) + 
  stat_fit_tidy(data = observed.data.df,label = sprintf("P = %.3g",stat(x_p.value))),method.args = list(formula = y ~ x,family = binomial(link = 'logit')),parse = T,label.x = "center",label.y = "top") +
  scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group),breaks = sort(unique(observed.probability.df$group))) +
  facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")

假设我有每个 group 的预期概率，我想将其作为 offset 添加到 geom_smooth 和 stat_fit_tidy glm。我该怎么做？

在 this Cross Validated post 之后，我将这些偏移量添加到 observed.data.df：

observed.data.df <- observed.data.df %>% dplyr::left_join(data.frame(group = c("A",p = qlogis(c(0.45,0.55))))

然后尝试将 offset(p) 表达式添加到 geom_smooth 和 stat_fit_tidy：

ggplot(observed.probability.df,method.args = list(formula = y ~ x + offset(p),family = binomial(link = 'logit'))) + 
  stat_fit_tidy(data = observed.data.df,breaks = sort(unique(observed.probability.df$group))) +
  facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")

但我收到这些警告：

Warning messages:
1: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
2: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
3: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
4: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 
5: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 
6: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)'

表示无法识别此添加，并且绘图仅带有条形：

知道如何将偏移项添加到 geom_smooth 和 stat_fit_tidy glm 吗？或者甚至只是到 geom_smooth glm（注释掉 stat_fit_tidy 行）？

或者，是否可以将通过在 geom_bar 调用（glm 之外拟合 ggplot })？

解决方法

问题在于模型公式中的 ggplot x 和 y 代表美学，而不是 data 中的变量名称，即模型公式中的 ggplot 名称代表美学。没有 p 美感，因此在尝试拟合时，找不到 p。这里不能传递数字向量，因为 ggplot 会将数据分成组并分别为每组拟合模型，我们可以将单个数字向量作为常量值传递。我认为人们需要定义一种新的伪美学及其相应的尺度，才能以这种方式进行拟合。

formula ggplot2 ggpmisc glm r r