问题描述
attach(airquality)
airquality <- airquality
breaks = seq(1.7,20.7,by=3.8)
airquality.split = cut(airquality$Wind,breaks,right=FALSE)
airquality.freq = table(airquality.split)
airquality.dist = cbind(airquality.freq,100*airquality.freq/sum(airquality.freq),cumsum(airquality.freq),100*cumsum(airquality.freq)/sum(airquality.freq))
colnames(airquality.dist) = c('Frequency','Percentage','Cum.Frequency','Cum.Percentage')
我想做同样的操作,但要考虑因素 Month
。我的意思是获取每个月嵌套的 Wind 变量频率的整个数据框,从而创建一个直方图。
Month Frequency Percentage Cum.Frequency Cum.Percentage
Month 1 [1.7,5.5) [...] [...] [...] [...]
Month 1 [5.5,9.3) [...] [...] [...] [...]
Month 1 [9.3,13.1) [...] [...] [...] [...]
Month 1 [13.1,16.9) [...] [...] [...] [...]
Month 1 [16.9,20.7) [...] [...] [...] [...]
Month 2 [1.7,5.5) [...] [...] [...] [...]
Month 2 [5.5,9.3) [...] [...] [...] [...]
Month 2 [9.3,13.1) [...] [...] [...] [...]
Month 2 [13.1,16.9) [...] [...] [...] [...]
Month 2 [16.9,20.7) [...] [...] [...] [...]
[...]
使用这些数据,我想制作一个具有相同颜色的不同系列 month
的直方图,以及一个月内百分比(或频率)的五列。是否可以直接使用 cut
函数进行此操作?
提前致谢。
解决方法
使用 cut
,您可以将 Wind
分成不同的组,并使用 Month
为每个 prop.table
计算比率。
library(dplyr)
airquality %>%
count(Month,group = cut(Wind,breaks,right=FALSE),name = 'Frequency') %>%
group_by(Month) %>%
mutate(Percentage = prop.table(Frequency) * 100,Cum.Frequency = cumsum(Frequency),Cum.Percentage = Cum.Frequency/max(Cum.Frequency) * 100) %>%
ungroup