问题描述
我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法弄清楚这一点。
我的数据:
行业 | 男 | 女 |
---|---|---|
艺术/娱乐 | 100 | 500 |
银行业务 | 600 | 100 |
医疗保健 | 53 | 65 |
教育 | 20 | 766 |
军事 | 47 | 96 |
医学 | 500 | 400 |
法律 | 500 | 500 |
计算机 | 200 | 144 |
销售 | 420 | 69 |
目标:
行业 | 男 | 女 | F% | M% |
---|---|---|---|---|
艺术/娱乐 | 100 | 500 | ||
银行业务 | 600 | 100 | ||
医疗保健 | 53 | 65 | ||
教育 | 20 | 766 | ||
军事 | 47 | 96 | ||
医学 | 500 | 400 | ||
法律 | 500 | 500 | ||
计算机 | 200 | 144 | ||
销售 | 420 | 69 |
解决方法
如果您的数据名为 df
,我们可以为男性和女性百分比制作列,如下所示:
df$Fpct <- df$Female / (df$Male + df$Female)
df$Mpct <- df$Male / (df$Male + df$Female)
注意,不要在变量名中使用 %
符号。
1) 比例 如果您的输入是 df1
(在最后的注释中可重复显示),则将列名称更改为所需的名称并将其转换为矩阵 {{1} }.最后使用 m
,边距为 1 表示行比例 - 2 表示列比例。请注意,我们在第一行转换为矩阵,因为 proportions
需要这样做。
proportions
2) rowSums 另一种方法是将 df1[-1] 除以给出相同 reuslt 的 rowSums。
m <- as.matrix(setNames(df1[-1],c("%M","%F")))
cbind(df1,100 * proportions(m,1))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
3) dplyr 使用 cbind(df1,setNames(100 * df1[-1] / rowSums(df1[-1]),"%F")))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
复制指定名称的列,然后将其乘以 100 并使用 across
除以列的总和列
c_across
4) 转换 这个与另一个答案很接近,但它不会覆盖输入:
df1 %>%
group_by(Industry) %>%
mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
ungroup
## # A tibble: 9 x 5
## Industry Male Female `%Male` `%Female`
## <chr> <int> <int> <dbl> <dbl>
## 1 Art/Entertainment 100 500 16.7 83.3
## 2 Banking 600 100 85.7 14.3
## 3 Healthcare 53 65 44.9 55.1
## ...snip...
注意
以可重现的形式输入:
transform(df1,"%M" = 100 * Male / (Male + Female),"%F" = 100 * Female / (Male + Female),check.names = FALSE)
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
,
使用库 janitor
有一个简单的解决方案,用于交叉制表
library(janitor)
data %>%
adorn_totals(where = c("row","col")) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
Industry Male Female Total
Art/Entertainment 100 (17%) 500 (83%) 600 (100%)
Banking 600 (86%) 100 (14%) 700 (100%)
Healthcare 53 (45%) 65 (55%) 118 (100%)
Education 20 (3%) 766 (97%) 786 (100%)
Military 47 (33%) 96 (67%) 143 (100%)
Medicine 500 (56%) 400 (44%) 900 (100%)
Law 500 (50%) 500 (50%) 1000 (100%)
Computer 200 (58%) 144 (42%) 344 (100%)
Sales 420 (86%) 69 (14%) 489 (100%)
Total 2440 (48%) 2640 (52%) 5080 (100%)
#OR
data %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns(position = "front")
Industry Male Female
Art/Entertainment 100 (16.67%) 500 (83.33%)
Banking 600 (85.71%) 100 (14.29%)
Healthcare 53 (44.92%) 65 (55.08%)
Education 20 (2.54%) 766 (97.46%)
Military 47 (32.87%) 96 (67.13%)
Medicine 500 (55.56%) 400 (44.44%)
Law 500 (50.00%) 500 (50.00%)
Computer 200 (58.14%) 144 (41.86%)
Sales 420 (85.89%) 69 (14.11%)
使用的数据
> data
Industry Male Female
1 Art/Entertainment 100 500
2 Banking 600 100
3 Healthcare 53 65
4 Education 20 766
5 Military 47 96
6 Medicine 500 400
7 Law 500 500
8 Computer 200 144
9 Sales 420 69
,
您可以使用 runtime error: file file:///opt/libreoffice6.3/share/xslt/export/spreadsheetml/table.xsl line 432 element copy-of xsltApplySequenceConstructor: A potential infinite template recursion was detected.
You can adjust xsltMaxDepth (--maxdepth) in order to raise the maximum number of nested template calls and variables/params (currently set to 3000).
Templates:
#0 name optimized-row-repeating
#1 name optimized-row-repeating
#2 name optimized-row-repeating
#3 name optimized-row-repeating
#4 name optimized-row-repeating
#5 name optimized-row-repeating
#6 name optimized-row-repeating
#7 name optimized-row-repeating
#8 name optimized-row-repeating
#9 name optimized-row-repeating
#10 name optimized-row-repeating
#11 name optimized-row-repeating
#12 name optimized-row-repeating
#13 name optimized-row-repeating
#14 name optimized-row-repeating
Variables:
#0
param thresholdmin
#1
param thresholdmax
#2
repetition
#3
tableRow
repetition
#4
param thresholdmin
#5
param thresholdmax
#6
repetition
#7
tableRow
repetition
#8
param thresholdmin
#9
param thresholdmax
#10
repetition
#11
tableRow
repetition
#12
param thresholdmin
#13
param thresholdmax
#14
repetition
Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///<filepath>/filename.xml> failed: 0xc10(Error Area:Io Class:Write Code:16))
,然后使用 F% 和 M% 创建两个新列
也许你可以使用这个:
group_by