问题描述
当我使用geom_col绘制该州白人的百分比(来自ggplot2软件包的中西部数据集)时,ggplot2会添加值而不是平均值。对我来说,这似乎是一个非常奇怪的默认值-我认为这不是条形图/柱形图所执行的操作。我阅读了帮助文档并做了一些谷歌搜索,但是也许我没有搜索正确的东西。
ggplot(data = midwest,mapping = aes(x = state,y = percwhite)) +
geom_col()
此图清楚地返回每个状态的所有值的总和。我希望它返回每个州的平均值。我只用了几周的时间就可以使用R,但是我不敢相信我以前从未注意到过。
解决方法
首先,创建一个均值表:
SELECT office_name,first_name,last_name /* add other columns as you see fit */
FROM offices INNER JOIN agents ON offices.office_key = agents.office_key
ORDER BY office_name,last_name /*,first_name */
现在,您可以使用表格制作条形图了:
ALTER TABLE agents
ADD office_key int not null after agent_key
;
,
问题中的代码产生“和”,因为在geom_col()
中,默认值为position = "stack"
。
以下是制作显示均值图形的各种可能方法:
library(ggplot2)
# the normal way of plotting data summaries like means is to use stat_summary()
ggplot(data = midwest,mapping = aes(x = state,y = percwhite)) +
stat_summary(geom = "col",fun = mean)
# same plot using less intuitive code (avoid if possible)
ggplot(data = midwest,y = percwhite)) +
geom_bar(stat = "summary",fun = mean)
# same plot using base R functions to pre-compute the means
means.df <- aggregate(percwhite ~ state,FUN = mean,data = midwest)
ggplot(data = means.df,y = percwhite)) +
geom_col() # one value per column,stacking has no effect
rm(means.df) # assuming it is no-longer needed
# same plot using pipes and dplyr "verbs"
library(dplyr)
midwest %>%
group_by(state) %>%
summarise(percwhite = mean(percwhite)) %>%
ggplot(mapping = aes(x = state,y = percwhite)) +
geom_col()
应注意,geom_bar()
与更新的geom_col()
非常相似。但是,只有geom_bar()
定义了参数stat
和fun
。