在 R 中按组汇总百分比

问题描述

我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法弄清楚这一点。

我的数据:

行业
艺术/娱乐 100 500
银行业务 600 100
医疗保健 53 65
教育 20 766
军事 47 96
医学 500 400
法律 500 500
计算机 200 144
销售 420 69

目标:

行业 F% M%
艺术/娱乐 100 500
银行业务 600 100
医疗保健 53 65
教育 20 766
军事 47 96
医学 500 400
法律 500 500
计算机 200 144
销售 420 69

解决方法

如果您的数据名为 df,我们可以为男性和女性百分比制作列,如下所示:

df$Fpct <- df$Female / (df$Male + df$Female)
df$Mpct <- df$Male / (df$Male + df$Female)

注意,不要在变量名中使用 % 符号。

,

1) 比例 如果您的输入是 df1(在最后的注释中可重复显示),则将列名称更改为所需的名称并将其转换为矩阵 {{1} }.最后使用 m ,边距为 1 表示行比例 - 2 表示列比例。请注意,我们在第一行转换为矩阵,因为 proportions 需要这样做。

proportions

2) rowSums 另一种方法是将 df1[-1] 除以给出相同 reuslt 的 rowSums。

m <- as.matrix(setNames(df1[-1],c("%M","%F")))
cbind(df1,100 * proportions(m,1))
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

3) dplyr 使用 cbind(df1,setNames(100 * df1[-1] / rowSums(df1[-1]),"%F"))) ## Industry Male Female %M %F ## 1 Art/Entertainment 100 500 16.666667 83.33333 ## 2 Banking 600 100 85.714286 14.28571 ## 3 Healthcare 53 65 44.915254 55.08475 ## ...snip... 复制指定名称的列,然后将其乘以 100 并使用 across 除以列的总和列

c_across

4) 转换 这个与另一个答案很接近,但它不会覆盖输入:

df1 %>%
  group_by(Industry) %>%
  mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
  ungroup
## # A tibble: 9 x 5
##   Industry           Male Female `%Male` `%Female`
##   <chr>             <int>  <int>   <dbl>     <dbl>
## 1 Art/Entertainment   100    500   16.7       83.3
## 2 Banking             600    100   85.7       14.3
## 3 Healthcare           53     65   44.9       55.1
## ...snip...

注意

以可重现的形式输入:

transform(df1,"%M" = 100 * Male / (Male + Female),"%F" = 100 * Female / (Male + Female),check.names = FALSE)
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...
,

使用库 janitor 有一个简单的解决方案,用于交叉制表

library(janitor)

data %>% 
  adorn_totals(where = c("row","col")) %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 0) %>% 
  adorn_ns(position = "front")

          Industry       Male     Female       Total
 Art/Entertainment  100 (17%)  500 (83%)  600 (100%)
           Banking  600 (86%)  100 (14%)  700 (100%)
        Healthcare   53 (45%)   65 (55%)  118 (100%)
         Education   20  (3%)  766 (97%)  786 (100%)
          Military   47 (33%)   96 (67%)  143 (100%)
          Medicine  500 (56%)  400 (44%)  900 (100%)
               Law  500 (50%)  500 (50%) 1000 (100%)
          Computer  200 (58%)  144 (42%)  344 (100%)
             Sales  420 (86%)   69 (14%)  489 (100%)
             Total 2440 (48%) 2640 (52%) 5080 (100%)

#OR

data %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 2) %>% 
  adorn_ns(position = "front")

          Industry         Male       Female
 Art/Entertainment 100 (16.67%) 500 (83.33%)
           Banking 600 (85.71%) 100 (14.29%)
        Healthcare  53 (44.92%)  65 (55.08%)
         Education  20  (2.54%) 766 (97.46%)
          Military  47 (32.87%)  96 (67.13%)
          Medicine 500 (55.56%) 400 (44.44%)
               Law 500 (50.00%) 500 (50.00%)
          Computer 200 (58.14%) 144 (41.86%)
             Sales 420 (85.89%)  69 (14.11%)

使用的数据

> data
           Industry Male Female
1 Art/Entertainment  100    500
2           Banking  600    100
3        Healthcare   53     65
4         Education   20    766
5          Military   47     96
6          Medicine  500    400
7               Law  500    500
8          Computer  200    144
9             Sales  420     69
,

您可以使用 runtime error: file file:///opt/libreoffice6.3/share/xslt/export/spreadsheetml/table.xsl line 432 element copy-of xsltApplySequenceConstructor: A potential infinite template recursion was detected. You can adjust xsltMaxDepth (--maxdepth) in order to raise the maximum number of nested template calls and variables/params (currently set to 3000). Templates: #0 name optimized-row-repeating #1 name optimized-row-repeating #2 name optimized-row-repeating #3 name optimized-row-repeating #4 name optimized-row-repeating #5 name optimized-row-repeating #6 name optimized-row-repeating #7 name optimized-row-repeating #8 name optimized-row-repeating #9 name optimized-row-repeating #10 name optimized-row-repeating #11 name optimized-row-repeating #12 name optimized-row-repeating #13 name optimized-row-repeating #14 name optimized-row-repeating Variables: #0 param thresholdmin #1 param thresholdmax #2 repetition #3 tableRow repetition #4 param thresholdmin #5 param thresholdmax #6 repetition #7 tableRow repetition #8 param thresholdmin #9 param thresholdmax #10 repetition #11 tableRow repetition #12 param thresholdmin #13 param thresholdmax #14 repetition Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///<filepath>/filename.xml> failed: 0xc10(Error Area:Io Class:Write Code:16)) ,然后使用 F% 和 M% 创建两个新列

也许你可以使用这个:

group_by

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...