问题描述
我有一个 data.frame
,在三个 group
中每两个 cluster
的计数我正在拟合逻辑回归 (binomial
glm
与logit
link function
),并使用 ggplot2
的 geom_bar
和 geom_smooth
绘制所有内容,并使用 ggpmisc
的添加 p 值stat_fit_tidy
。
这是它的样子:
数据:
library(dplyr)
observed.probability.df <- data.frame(cluster = c("c1","c1","c2","c3","c3"),group = rep(c("A","B"),3),p = c(0.4,0.6,0.5,0.4))
observed.data.df <- do.call(rbind,lapply(c("c1",function(l){
do.call(rbind,lapply(c("A",function(g)
data.frame(cluster = l,group = g,value = c(rep(0,1000*dplyr::filter(observed.probability.df,cluster == l & group != g)$p),rep(1,cluster == l & group == g)$p)))
))
}))
observed.probability.df$cluster <- factor(observed.probability.df$cluster,levels = c("c1","c3"))
observed.data.df$cluster <- factor(observed.data.df$cluster,"c3"))
observed.probability.df$group <- factor(observed.probability.df$group,levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group,"B"))
剧情:
library(ggplot2)
library(ggpmisc)
ggplot(observed.probability.df,aes(x = group,y = p,group = cluster,fill = group)) +
geom_bar(stat = 'identity') +
geom_smooth(data = observed.data.df,mapping = aes(x = group,y = value,group = cluster),color = "black",method = 'glm',method.args = list(family = binomial(link = 'logit'))) +
stat_fit_tidy(data = observed.data.df,label = sprintf("P = %.3g",stat(x_p.value))),method.args = list(formula = y ~ x,family = binomial(link = 'logit')),parse = T,label.x = "center",label.y = "top") +
scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group),breaks = sort(unique(observed.probability.df$group))) +
facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
假设我有每个 group
的预期概率,我想将其作为 offset
添加到 geom_smooth
和 stat_fit_tidy
glm
。我该怎么做?
在 this Cross Validated post 之后,我将这些偏移量添加到 observed.data.df
:
observed.data.df <- observed.data.df %>% dplyr::left_join(data.frame(group = c("A",p = qlogis(c(0.45,0.55))))
然后尝试将 offset(p)
表达式添加到 geom_smooth
和 stat_fit_tidy
:
ggplot(observed.probability.df,method.args = list(formula = y ~ x + offset(p),family = binomial(link = 'logit'))) +
stat_fit_tidy(data = observed.data.df,breaks = sort(unique(observed.probability.df$group))) +
facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")
但我收到这些警告:
Warning messages:
1: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)'
2: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)'
3: computation Failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)'
4: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)'
5: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)'
6: computation Failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)'
表示无法识别此添加,并且绘图仅带有条形:
知道如何将偏移项添加到 geom_smooth
和 stat_fit_tidy
glm
吗?或者甚至只是到 geom_smooth
glm(注释掉 stat_fit_tidy
行)?
或者,是否可以将通过在 geom_bar
调用(glm
之外拟合 ggplot
})?
解决方法
问题在于模型公式中的 ggplot x
和 y
代表美学,而不是 data
中的变量名称,即模型公式中的 ggplot 名称代表美学。没有 p
美感,因此在尝试拟合时,找不到 p
。这里不能传递数字向量,因为 ggplot 会将数据分成组并分别为每组拟合模型,我们可以将单个数字向量作为常量值传递。我认为人们需要定义一种新的伪美学及其相应的尺度,才能以这种方式进行拟合。