创建分类变量

问题描述

我想用下一个条件对一个变量进行分类：

0 - 4：“失败” 5 - 7：“好” 8 - 10：“优秀” 以上都不是：不适用

我尝试使用重新编码功能

变量的值为数值

segur <- data$segur

使用重新编码创建了一个新变量

dt1 <- recode(segur,"c(0,4)='suspenso';c(5,7)='aceptable';c(8,10)='excelente'; else='NA'")
dt1

我该如何解决？

解决方法

在基 factor 中使用 R

数据：

# set random seed
set.seed(1L)
# without any NA
x1 <- sample(x = 1:10,size = 20,replace=TRUE)
# with NA
x2 <- sample(x = c(1:10,NA),replace=TRUE)

代码：

# without any NA
as.character(factor(x1,levels = c(0:10),labels = c(rep("fail",5),rep("good",3),rep("excellent",3)),exclude=NA))

# with NA    
as.character(factor(x2,exclude=NA))

我猜你可以像下面这样使用 cut

cut(segur,c(0,4,7,10),labels = c("fail","good","excellent"))

示例

> segur
 [1]  6  1  4 -2 -1 10  8  0  5  9

> cut(segur,"excellent"))
 [1] good      fail      fail      <NA>      <NA>      excellent excellent
 [8] <NA>      good      excellent
Levels: fail good excellent

这是使用 fmtr 包的解决方案。您可以使用 value 和 condition 函数创建分类格式，然后使用 fapply 函数将该格式应用于数值数据。下面是一个例子：

library(fmtr)

# Create sample data
df <- read.table(header = TRUE,text = '
ID  segur
1      0
2      8
3      5
4      11
5      7')

# Create format
fmt <- value(condition(x >= 0 & x <=4,"fail"),condition(x >= 5 & x <=7,"good"),condition(x >= 8 & x <= 10,"excellent"),condition(TRUE,NA))

# Apply categorization
df$segur_cat <- fapply(df$segur,fmt)

# View results
df
#   ID segur segur_cat
# 1  1     0      fail
# 2  2     8 excellent
# 3  3     5      good
# 4  4    11      <NA>
# 5  5     7      good

categorical-data r r recode