问题描述
我想使用facet_grid
将百分比直方图(积分为100%)分成两个方面。但是,拆分为多个构面时,每个构面本身未集成到100%。过去有这样的问题has been resolved here,但是我无法将这种解决方案转换为x是一个因子的当前情况,因此使用stat(density)
的直方图不起作用。
我的数据
具有两列的数据框。 equipment
表示一个家庭是否有足够的家庭教育设备,children_n
表示一个孩子的数量。
library(tidyverse)
library(magrittr)
df <-
structure(list(equipment = c(1,1,1),children_n = c(4,4,2,3,7,5,8,6,9,4)),row.names = c(NA,-1059L),class = c("tbl_df","tbl","data.frame"))
df
## # A tibble: 1,059 x 2
## equipment children_n
## <dbl> <dbl>
## 1 1 4
## 2 0 4
## 3 1 2
## 4 1 2
## 5 0 2
## 6 1 1
## 7 1 1
## 8 1 3
## 9 1 2
## 10 1 3
## # ... with 1,049 more rows
如果孩子人数超过6岁,我希望将这些情况归为“ 6+”类别。
df %<>%
mutate_at(vars(children_n),as.character) %>%
mutate_at(vars(children_n),recode,"9" = "6_plus","8" = "6_plus","7" = "6_plus","6" = "6_plus") %>%
mutate_at(vars(children_n),fct_relevel,"1","2","3","4","5","6_plus")
glimpse(df)
## Rows: 1,059
## Columns: 2
## $ equipment <dbl> 1,...
## $ children_n <fct> 4,6_plus,...
现在,我想在两个单独的面板中绘制儿童人数的比例:一个面板用于配备足够设备的家庭,另一个面板用于不配备设备的家庭:
df %>%
ggplot(data = .,aes(x = children_n,y = equipment)) +
geom_histogram(aes(y = (..count..)/sum(..count..)),stat = "count",fill = "darkblue") +
geom_text(aes(label = scales::percent(((..count..)/sum(..count..)),accuracy = 1),y = ((..count..)/sum(..count..)) ),stat= "count",vjust = -.5,color = "darkblue") +
scale_y_continuous(labels = scales::percent) +
facet_grid(~ equipment,labeller = as_labeller(c("1" = "have enough equipment","0" = "don't have enough equipment")))
这提供了两个*不要*独立集成到100%的面板:
试图解决问题
我发现this question描述了相同的意图和问题。选择的解决方案建议将geom_histogram
定义为密度,以便将其积分到100%。但这在我的情况下不起作用,因为stat(density)
要求x变量将是连续的,这与我的情况中x是一个因子不同。
df %>%
ggplot(data = .,y = equipment)) +
geom_histogram(aes(y = stat(density) * 6),binwidth = 6,fill = "darkblue") +
facet_grid(~ equipment,"0" = "don't have enough equipment")))
错误:StatBin需要连续的x变量:x变量为 离散的。也许您想要stat =“ count”?
其他方法建议使用..PANEL..
,而其他方法则强烈反对。
如何以适当的方式使这两个方面显示独立整合到100%的百分比?
解决方法
可以这样实现:
- 将构面变量映射到
group
aes - 使用例如
tapply
获取每个组或构面的总数
顺便说一句:我将用于规范化的代码放在辅助函数中,以减少代码重复和可读性
library(tidyverse)
library(magrittr)
df %<>%
mutate_at(vars(children_n),as.character) %>%
mutate_at(vars(children_n),recode,"9" = "6_plus","8" = "6_plus","7" = "6_plus","6" = "6_plus") %>%
mutate_at(vars(children_n),fct_relevel,"1","2","3","4","5","6_plus")
help <- function(count,group) {
count / tapply(count,group,sum)[group]
}
df %>%
ggplot(data = .,aes(x = children_n,y = equipment,group = equipment)) +
geom_histogram(aes(y = help(..count..,..group..)),stat = "count",fill = "darkblue") +
geom_text(aes(label = scales::percent(help(..count..,..group..),accuracy = 1),y = help(..count..,..group..) ),stat= "count",vjust = -.5,color = "darkblue") +
scale_y_continuous(labels = scales::percent) +
facet_grid(~ equipment,labeller = as_labeller(c("1" = "have enough equipment","0" = "don't have enough equipment")))
#> Warning: Ignoring unknown parameters: binwidth,bins,pad