问题描述
我有一个数据框,其中包含2列来模拟NFL赛季:球队和排名。我正在尝试使用ggridges绘制每个团队从1到10的频率分布图。我可以使该图正常工作,但我想显示每个箱中每个团队/等级的数量。到目前为止,我一直没有成功。
ggplot(results,aes(x=rank,y=team,group = team)) +
geom_density_ridges2(aes(fill=team),stat='binline',binwidth=1,scale = 0.9,draw_baseline=T) +
scale_x_continuous(limits = c(0,11),breaks = seq(1,10,1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84","#FB4F14","#7C1415","#A71930","#00143F","#0C264C","#192E6C","#136677","#203731"),name = NULL)
哪个创建了这个图?
geom_text(stat='bin',aes(y = team + 0.95*stat(count/max(count)),label = ifelse(stat(count) > 0,stat(count),""))) +
不是确切的数据集,但这至少足以运行原始图:
results = data.frame(team = rep(c('Jets','Giants','Washington','Falcons','Bengals','Jaguars','Texans','Cowboys','Vikings'),1000),rank = sample(1:20,9000,replace = T))
解决方法
如何计算每个垃圾箱的数量,连接到原始数据并使用新变量n
作为标签?
library(dplyr) # for count,left_join
results %>%
count(team,rank) %>%
left_join(results) %>%
ggplot(aes(rank,team,group = team)) +
geom_density_ridges2(aes(fill = team),stat = 'binline',binwidth = 1,scale = 0.9,draw_baseline = TRUE) +
scale_x_continuous(limits = c(0,11),breaks = seq(1,10,1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84","#FB4F14","#7C1415","#A71930","#00143F","#0C264C","#192E6C","#136677","#203731"),name = NULL) +
geom_text(aes(label = n),color = "white",nudge_y = 0.2)
结果:
, Neilfws的回答很好,但是我总是发现geom_ridgeline
在这种情况下很难使用,因此我通常使用geom_rect
重新创建它们:
library(dplyr)
results %>%
count(team,rank) %>%
filter(rank<=10) %>%
mutate(team=factor(team)) %>%
ggplot() +
geom_rect(aes(xmin=rank-0.5,xmax=rank+0.5,ymin=team,fill=team,ymax=as.numeric(team)+n*0.75/max(n))) +
geom_text(aes(x=rank,y=as.numeric(team)-0.1,label=n)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84",name = NULL) +
ylab("team")
我特别喜欢从geom_rect
而不是山脊线得到的精细控制水平。但是,您的确失去了围绕每个山脊线绘制的漂亮边界线的功能,因此,如果这很重要,请选择其他答案。