在 R 中按组汇总百分比

问题描述

我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法弄清楚这一点。

我的数据：

行业	男	女
艺术/娱乐	100	500
银行业务	600	100
医疗保健	53	65
教育	20	766
军事	47	96
医学	500	400
法律	500	500
计算机	200	144
销售	420	69

目标：

行业	男	女
艺术/娱乐	100	500
银行业务	600	100
医疗保健	53	65
教育	20	766
军事	47	96
医学	500	400
法律	500	500
计算机	200	144
销售	420	69

解决方法

如果您的数据名为 df，我们可以为男性和女性百分比制作列，如下所示：

df$Fpct <- df$Female / (df$Male + df$Female)
df$Mpct <- df$Male / (df$Male + df$Female)

注意，不要在变量名中使用 % 符号。

1) 比例 如果您的输入是 df1（在最后的注释中可重复显示），则将列名称更改为所需的名称并将其转换为矩阵 {{1} }.最后使用 m ，边距为 1 表示行比例 - 2 表示列比例。请注意，我们在第一行转换为矩阵，因为 proportions 需要这样做。

proportions

2) rowSums 另一种方法是将 df1[-1] 除以给出相同 reuslt 的 rowSums。

m <- as.matrix(setNames(df1[-1],c("%M","%F")))
cbind(df1,100 * proportions(m,1))
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

3) dplyr 使用 cbind(df1,setNames(100 * df1[-1] / rowSums(df1[-1]),"%F"))) ## Industry Male Female %M %F ## 1 Art/Entertainment 100 500 16.666667 83.33333 ## 2 Banking 600 100 85.714286 14.28571 ## 3 Healthcare 53 65 44.915254 55.08475 ## ...snip... 复制指定名称的列，然后将其乘以 100 并使用 across 除以列的总和列

c_across

4) 转换 这个与另一个答案很接近，但它不会覆盖输入：

df1 %>%
  group_by(Industry) %>%
  mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
  ungroup
## # A tibble: 9 x 5
##   Industry           Male Female `%Male` `%Female`
##   <chr>             <int>  <int>   <dbl>     <dbl>
## 1 Art/Entertainment   100    500   16.7       83.3
## 2 Banking             600    100   85.7       14.3
## 3 Healthcare           53     65   44.9       55.1
## ...snip...

注意

以可重现的形式输入：

transform(df1,"%M" = 100 * Male / (Male + Female),"%F" = 100 * Female / (Male + Female),check.names = FALSE)
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

使用库 janitor 有一个简单的解决方案，用于交叉制表

library(janitor)

data %>% 
  adorn_totals(where = c("row","col")) %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 0) %>% 
  adorn_ns(position = "front")

          Industry       Male     Female       Total
 Art/Entertainment  100 (17%)  500 (83%)  600 (100%)
           Banking  600 (86%)  100 (14%)  700 (100%)
        Healthcare   53 (45%)   65 (55%)  118 (100%)
         Education   20  (3%)  766 (97%)  786 (100%)
          Military   47 (33%)   96 (67%)  143 (100%)
          Medicine  500 (56%)  400 (44%)  900 (100%)
               Law  500 (50%)  500 (50%) 1000 (100%)
          Computer  200 (58%)  144 (42%)  344 (100%)
             Sales  420 (86%)   69 (14%)  489 (100%)
             Total 2440 (48%) 2640 (52%) 5080 (100%)

#OR

data %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 2) %>% 
  adorn_ns(position = "front")

          Industry         Male       Female
 Art/Entertainment 100 (16.67%) 500 (83.33%)
           Banking 600 (85.71%) 100 (14.29%)
        Healthcare  53 (44.92%)  65 (55.08%)
         Education  20  (2.54%) 766 (97.46%)
          Military  47 (32.87%)  96 (67.13%)
          Medicine 500 (55.56%) 400 (44.44%)
               Law 500 (50.00%) 500 (50.00%)
          Computer 200 (58.14%) 144 (41.86%)
             Sales 420 (85.89%)  69 (14.11%)

使用的数据

> data
           Industry Male Female
1 Art/Entertainment  100    500
2           Banking  600    100
3        Healthcare   53     65
4         Education   20    766
5          Military   47     96
6          Medicine  500    400
7               Law  500    500
8          Computer  200    144
9             Sales  420     69

您可以使用 runtime error: file file:///opt/libreoffice6.3/share/xslt/export/spreadsheetml/table.xsl line 432 element copy-of xsltApplySequenceConstructor: A potential infinite template recursion was detected. You can adjust xsltMaxDepth (--maxdepth) in order to raise the maximum number of nested template calls and variables/params (currently set to 3000). Templates: #0 name optimized-row-repeating #1 name optimized-row-repeating #2 name optimized-row-repeating #3 name optimized-row-repeating #4 name optimized-row-repeating #5 name optimized-row-repeating #6 name optimized-row-repeating #7 name optimized-row-repeating #8 name optimized-row-repeating #9 name optimized-row-repeating #10 name optimized-row-repeating #11 name optimized-row-repeating #12 name optimized-row-repeating #13 name optimized-row-repeating #14 name optimized-row-repeating Variables: #0 param thresholdmin #1 param thresholdmax #2 repetition #3 tableRow repetition #4 param thresholdmin #5 param thresholdmax #6 repetition #7 tableRow repetition #8 param thresholdmin #9 param thresholdmax #10 repetition #11 tableRow repetition #12 param thresholdmin #13 param thresholdmax #14 repetition Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///<filepath>/filename.xml> failed: 0xc10(Error Area:Io Class:Write Code:16))，然后使用 F% 和 M% 创建两个新列

也许你可以使用这个：

group_by

aggregate aggregate percentage r r

在 R 中按组汇总百分比

问题描述

解决方法

注意

相关问答