问题描述
我正在尝试编写一个密度函数,该函数将应用法线曲线作为每个面(组)的参考。下面,我试图通过避免直接定义函数来简化核心问题。
尝试
# Initial setup
library(dplyr)
data <- mtcars
group = "cyl"
variable = "mpg"
gform <- reformulate(".",response=group)
data[[group]] <- as.factor(data[[group]])
# Make data for normal curves
dat_norm <- data %>% group_by(.data[[group]]) %>%
summarise(mpg=seq(min(.[[variable]]),max(.[[variable]]),length.out=100),density=dnorm(seq(min(.[[variable]]),mean(.[[variable]]),sd(.[[variable]])))
# Make plot
library(ggplot2)
ggplot(data,aes_string(x=variable,fill=group)) +
geom_density() +
geom_line(data=dat_norm,y="density",group=group),size=1.2) +
facet_grid(gform)
你可以看到这里的问题是,ggplot 似乎将相同的数据应用于所有方面,并且没有按组进行自定义。我们可以手动完成但是问题是这种方法不允许为最终函数使用未知数量的组。
预期结果
# As explained above,the prevIoUs figure has the same line for each facet.
# I would like to have the following instead:
norm.1 <- data %>%
filter(.[[group]]==levels(.[[group]])[1]) %>%
with(data.frame(x = seq(min(.[[variable]]),y = dnorm(seq(min(.[[variable]]),sd(.[[variable]])))) %>%
mutate_(cyl = factor(levels(data[[group]])[1],levels = levels(data[[group]])))
norm.2 <- data %>%
filter(.[[group]]==levels(.[[group]])[2]) %>%
with(data.frame(x = seq(min(.[[variable]]),sd(.[[variable]])))) %>%
mutate_(cyl = factor(levels(data[[group]])[2],levels = levels(data[[group]])))
norm.3 <- data %>%
filter(.[[group]]==levels(.[[group]])[3]) %>%
with(data.frame(x = seq(min(.[[variable]]),sd(.[[variable]])))) %>%
mutate_(cyl = factor(levels(data[[group]])[3],levels = levels(data[[group]])))
# Make plot
ggplot(data,fill=group)) +
geom_density() +
facet_grid(gform) +
geom_line(data = norm.1,aes(x = x,y = y),size=1.2) +
geom_line(data = norm.2,size=1.2) +
geom_line(data = norm.3,size=1.2)
问题
正如所解释的,后一种方法迫使我重复 geom_line()
调用的次数与组数相同。但是,在函数内部,我们不会提前知道组的数量。有什么解决办法?
注意:这是my previous question的后续问题。
解决方法
ggplot 运行正常。您正在创建的数据框 (dat_norm) 只是将整体分布重复了 3 次。对您的总结的一个小改动将使其尊重分组:
# Initial setup
library(dplyr)
data <- mtcars
group = "cyl"
variable = "mpg"
gform <- reformulate(".",response=group)
data[[group]] <- as.factor(data[[group]])
# Make data for normal curves
dat_norm <- data %>% group_by(.data[[group]]) %>%
# HERE IS THE CHANGE: do(
do(summarise(.,mpg=seq(min(.[[variable]]),max(.[[variable]]),length.out=100),density=dnorm(seq(min(.[[variable]]),mean(.[[variable]]),sd(.[[variable]]))))
# Make plot
library(ggplot2)
ggplot(data,aes_string(x=variable,fill=group)) +
geom_density() +
geom_line(data=dat_norm,y="density",group=group),size=1.2) +
facet_grid(gform)