用于创建频率表的嵌套切割函数

问题描述

我正在执行一个频率表,例如来自空气质量数据集。代码下方:

attach(airquality)
airquality <- airquality
breaks = seq(1.7,20.7,by=3.8)
airquality.split = cut(airquality$Wind,breaks,right=FALSE)
airquality.freq = table(airquality.split)
airquality.dist = cbind(airquality.freq,100*airquality.freq/sum(airquality.freq),cumsum(airquality.freq),100*cumsum(airquality.freq)/sum(airquality.freq))
colnames(airquality.dist) = c('Frequency','Percentage','Cum.Frequency','Cum.Percentage')

我想做同样的操作,但要考虑因素 Month。我的意思是获取每个月嵌套的 Wind 变量频率的整个数据框,从而创建一个直方图。

Month                           Frequency Percentage Cum.Frequency Cum.Percentage
Month 1          [1.7,5.5)          [...]  [...]           [...]       [...]
Month 1          [5.5,9.3)          [...]  [...]           [...]       [...]
Month 1          [9.3,13.1)         [...]  [...]           [...]       [...]
Month 1          [13.1,16.9)        [...]  [...]           [...]       [...]
Month 1          [16.9,20.7)        [...]  [...]           [...]       [...]
Month 2          [1.7,5.5)          [...]  [...]           [...]       [...]
Month 2          [5.5,9.3)          [...]  [...]           [...]       [...]
Month 2          [9.3,13.1)         [...]  [...]           [...]       [...]
Month 2          [13.1,16.9)        [...]  [...]           [...]       [...]
Month 2          [16.9,20.7)        [...]  [...]           [...]       [...]

[...]

使用这些数据,我想制作一个具有相同颜色的不同系列 month 的直方图,以及一个月内百分比(或频率)的五列。是否可以直接使用 cut 函数进行此操作?

提前致谢。

解决方法

使用 cut,您可以将 Wind 分成不同的组,并使用 Month 为每个 prop.table 计算比率。

library(dplyr)

airquality %>%
  count(Month,group = cut(Wind,breaks,right=FALSE),name = 'Frequency') %>%
  group_by(Month) %>%
  mutate(Percentage = prop.table(Frequency) * 100,Cum.Frequency = cumsum(Frequency),Cum.Percentage = Cum.Frequency/max(Cum.Frequency) * 100) %>%
  ungroup