为什么geom_col添加而不是显示均值?

问题描述

当我使用geom_col绘制该州白人的百分比(来自ggplot2软件包的中西部数据集)时,ggplot2会添加值而不是平均值。对我来说,这似乎是一个非常奇怪的认值-我认为这不是条形图/柱形图所执行的操作。我阅读了帮助文档并做了一些谷歌搜索,但是也许我没有搜索正确的东西。

ggplot(data = midwest,mapping = aes(x = state,y = percwhite)) +
  geom_col()

此图清楚地返回每个状态的所有值的总和。我希望它返回每个州的平均值。我只用了几周的时间就可以使用R,但是我不敢相信我以前从未注意到过。

解决方法

首先,创建一个均值表:

SELECT office_name,first_name,last_name /* add other columns as you see fit */
FROM offices INNER JOIN agents ON offices.office_key = agents.office_key
ORDER BY office_name,last_name /*,first_name */

现在,您可以使用表格制作条形图了:

ALTER TABLE agents
   ADD office_key int not null after agent_key
   ;
,

问题中的代码产生“和”,因为在geom_col()中,默认值为position = "stack"

以下是制作显示均值图形的各种可能方法:

library(ggplot2)

# the normal way of plotting data summaries like means is to use stat_summary()
ggplot(data = midwest,mapping = aes(x = state,y = percwhite)) +
  stat_summary(geom = "col",fun = mean)

# same plot using less intuitive code (avoid if possible)
ggplot(data = midwest,y = percwhite)) +
  geom_bar(stat = "summary",fun = mean)

# same plot using base R functions to pre-compute the means
means.df <- aggregate(percwhite ~ state,FUN = mean,data = midwest)

ggplot(data = means.df,y = percwhite)) +
  geom_col() # one value per column,stacking has no effect

rm(means.df) # assuming it is no-longer needed

# same plot using pipes and dplyr "verbs"
library(dplyr)
midwest %>%
  group_by(state) %>%
  summarise(percwhite = mean(percwhite)) %>%
  ggplot(mapping = aes(x = state,y = percwhite)) +
  geom_col()

应注意,geom_bar()与更新的geom_col()非常相似。但是,只有geom_bar()定义了参数statfun