问题描述
我试图将一个因素的所有级别放入我单独创建的几个组(和其他级别)中。我使用以下代码从定义为因子的每个变量中删除缺失值和不必要的级别。 “gss”是原始数据集,“wrkslf”和“income06”是我要使用的变量。
v1 <- select(gss,wrkslf,income06) %>% na.omit() %>% filter(income06 != "Refused")
“收入06”的问题是上面的层次太多了。它看起来像这样:
income06 count
Under $1 000 103
$1 000 To 2 999 95
$3 000 To 3 999 78
$4 000 To 4 999 48
$5 000 To 5 999 78
$6 000 To 6 999 91
$7 000 To 7 999 114
$8 000 To 9 999 179
$10000 To 12499 345
$12500 To 14999 291
...
所以我尝试运行以下代码将关卡放入我创建的更大的组中:
v2 = v1 %>% group_by(income06) %>% summarise(count = n())
v2 <- v2 %>% mutate(tidyIncomeLevel = recode(income06,"UNDER $1 000" = "Under $1,000","$1 000 TO 2 999" = "Under $10,"$3 000 TO 3 999" = "Under $10,"$4 000 TO 4 999" = "Under $10,"$5 000 TO 5 999" = "Under $10,"$6 000 TO 6 999" = "Under $10,"$7 000 TO 7 999" = "Under $10,"$8 000 TO 8 999" = "Under $10,"$9 000 TO 9 999" = "Under $10,"$10000 TO 12499" = "Under $25,"$12500 TO 14999" = "Under $25,"$15000 TO 17499" = "Under $25,"$17500 TO 19999" = "Under $25,"$20000 TO 22499" = "Under $25,"$22500 TO 24999" = "Under $25,"$25000 TO 29999" = "Under $40,"$30000 TO 34999" = "Under $40,"$35000 TO 39999" = "Under $40,"$40000 TO 49999" = "Under $60,"$50000 TO 59999" = "Under $60,"$60000 TO 74999" = "Under $90,"$75000 TO 89999" = "Under $90,"$90000 TO 109999" = "Under $150,"$110000 TO 129999" = "Under $150,"$130000 TO 149999" = "Under $150,000"))
当然,我安装了使用上述功能所需的所有软件包,但是在v2中,我编写的功能根本不起作用。如何修复代码以便我可以将集成组用作新级别? 您是否有更好的主意让输出更简洁明了?
解决方法
forcats 包有各自的便利功能:
x <- c("A1","A2","B2","B1","A3")
library(forcats)
xCondense <- fct_collapse(x,A = c("A1","A3"),B = c("B1","B2"))
,
forcats 包具有用于处理因子的出色工具。
library(forcats)
gss <- data.frame(income0 = sample(1:9,size = 100,replace = TRUE) * 1000) %>%
mutate(income06 = fct_reorder(.f = factor(paste0("$",income0," TO ",income0 + 999)),.x = income0))
summarise(group_by(gss,income06),count = n())
# # A tibble: 9 x 2
# income06 count
# <fct> <int>
# 1 $1000 TO 1999 5
# 2 $2000 TO 2999 11
# 3 $3000 TO 3999 14
# ...
gss <- gss %>% mutate(tidyIncomeLevel = fct_recode(income06,"Under $5,000" = "UNDER $1 000",000" = "$1000 TO 1999",000" = "$2000 TO 2999",000" = "$3000 TO 3999",000" = "$4000 TO 4999","Under $10,000" = "$5000 TO 5999",000" = "$6000 TO 6999",000" = "$7000 TO 7999",000" = "$8000 TO 8999",000" = "$9000 TO 9999","Under $25,000" = "$10000 TO 12499"))
summarise(group_by(gss,tidyIncomeLevel),count = n())
# # A tibble: 2 x 2
# tidyIncomeLevel count
# <fct> <int>
# 1 Under $5,000 39
# 2 Under $10,000 61