向 geom_smooth 和 stat_fit_tidy 中的 glm 公式添加偏移项

问题描述

我有一个 data.frame,在三个 group 中每两个 cluster 的计数我正在拟合逻辑回归 (binomial glmlogit link function),并使用 ggplot2geom_bargeom_smooth 绘制所有内容,并使用 ggpmisc添加 p 值stat_fit_tidy

这是它的样子:

数据:

library(dplyr)

observed.probability.df <- data.frame(cluster = c("c1","c1","c2","c3","c3"),group = rep(c("A","B"),3),p = c(0.4,0.6,0.5,0.4))
observed.data.df <- do.call(rbind,lapply(c("c1",function(l){
  do.call(rbind,lapply(c("A",function(g)
    data.frame(cluster = l,group = g,value = c(rep(0,1000*dplyr::filter(observed.probability.df,cluster == l & group != g)$p),rep(1,cluster == l & group == g)$p)))
  ))
}))

observed.probability.df$cluster <- factor(observed.probability.df$cluster,levels = c("c1","c3"))
observed.data.df$cluster <- factor(observed.data.df$cluster,"c3"))
observed.probability.df$group <- factor(observed.probability.df$group,levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group,"B"))

剧情:

library(ggplot2)
library(ggpmisc)

ggplot(observed.probability.df,aes(x = group,y = p,group = cluster,fill = group)) +
  geom_bar(stat = 'identity') +
  geom_smooth(data = observed.data.df,mapping = aes(x = group,y = value,group = cluster),color = "black",method = 'glm',method.args = list(family = binomial(link = 'logit'))) + 
  stat_fit_tidy(data = observed.data.df,label = sprintf("P = %.3g",stat(x_p.value))),method.args = list(formula = y ~ x,family = binomial(link = 'logit')),parse = T,label.x = "center",label.y = "top") +
  scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group),breaks = sort(unique(observed.probability.df$group))) +
  facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")

enter image description here

假设我有每个 group 的预期概率,我想将其作为 offset 添加geom_smoothstat_fit_tidy glm。我该怎么做?

this Cross Validated post 之后,我将这些偏移量添加observed.data.df

observed.data.df <- observed.data.df %>% dplyr::left_join(data.frame(group = c("A",p = qlogis(c(0.45,0.55))))

然后尝试将 offset(p) 表达式添加geom_smoothstat_fit_tidy

ggplot(observed.probability.df,method.args = list(formula = y ~ x + offset(p),family = binomial(link = 'logit'))) + 
  stat_fit_tidy(data = observed.data.df,breaks = sort(unique(observed.probability.df$group))) +
  facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")

但我收到这些警告:

Warning messages:
1: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
2: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
3: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
4: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 
5: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 
6: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 

表示无法识别此添加,并且绘图仅带有条形:

enter image description here

知道如何将偏移项添加geom_smoothstat_fit_tidy glm 吗?或者甚至只是到 geom_smooth glm(注释掉 stat_fit_tidy 行)?

或者,是否可以将通过在 geom_bar 调用glm 之外拟合 ggplot })?

解决方法

问题在于模型公式中的 ggplot xy 代表美学,而不是 data 中的变量名称,即模型公式中的 ggplot 名称代表美学。没有 p 美感,因此在尝试拟合时,找不到 p。这里不能传递数字向量,因为 ggplot 会将数据分成组并分别为每组拟合模型,我们可以将单个数字向量作为常量值传递。我认为人们需要定义一种新的伪美学及其相应的尺度,才能以这种方式进行拟合。