使用R中的mutate和case_when语句用unite填充列

问题描述

我有一个名称列表和为这些名称分配的阈值,以确定我是否适当分配了该名称。

您可以使用以下方法重新创建测试数据集:

df <- data.frame(level1 = c("Eukaryota","Eukaryota","Eukaryota"),level2=c("Opisthokonta","Alveolata","Opisthokonta","Alveolata"),level3=c("Fungi","Ciliophora","Fungi","Dinoflagellata"),level4=c("Basidiomycota","Spirotrichea","Basidiomycota","Dinophyceae"),value = c("100;5;4;2","100;100;100;100","100;80;60;50","90;50;40;40","100;80;20;0"))

我想使用整洁的mutate()case_when()节来找到通过适当阈值的分类标准。因此,下面整洁的经文声明将阈值分解,然后尝试执行此操作。 我的脖子

  1. 使用case_when()ifelse()语句-使用ifelse()可能更合适?
  2. 我不知道如何用串联的level1-levelX 填充新列“ 名称_更新”。目前,unite()不适合,因为这与整个数据集有关。实际上,我有更多的专栏文章,所以在没有整洁的level1:level3语法的情况下执行 会很痛苦!
df_updated <- df %>% 
  separate(value,c("threshold1","threshold2","threshold3","threshold4"),sep =";") %>% 
  mutate(Name_updated = case_when(
    threshold4 >= 50 ~ unite(level1:level4,sep = ";"),#Fill with all taxonomic names to level4
    threshold4 < 50 & threshold3 >= 60 ~ unite(level1:level3,#If last threshold is <50,only fill with taxonomic names to level3
    threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ unite(level1:level2,#If thresholds for level 3 and 4 are below,fill only level1;level2
    TRUE ~ level1)) %>% #Otherwise fill with only level 1
  data.frame

所需的输出

> df_updated$Name_updated
# Output of this new list:
Eukaryota
Eukaryota;Alveolata;Ciliophora;Spirotrichea
Eukaryota;Opisthokonta;Fungi;Basidiomycota
Eukaryota;Alveolata
Eukaryota;Alveolata

下一步是编写一个函数,该函数允许用户指定脚本中使用的阈值。因此,我确实需要探究/确定什么阈值可以通过。

解决方法

问题出在unite列的typeseparate上。默认情况下,convert = FALSEcharacter类列

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>% 
  type.convert(as.is = TRUE) %>%
  separate(value,c("threshold1","threshold2","threshold3","threshold4"),sep =";",convert = TRUE) %>% 
  mutate(Name_updated = 
     case_when(
      threshold4 >= 50 ~
         select(.,starts_with('level')) %>% 
            reduce(str_c,sep=";"),threshold4 < 50 & threshold3 >= 60 ~ 
          select(.,level1:level3) %>%
            reduce(str_c,threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ 
          select(.,level1:level2) %>% 
            reduce(str_c,TRUE ~ level1))
#  level1       level2         level3        level4 threshold1 threshold2 threshold3 threshold4
#1 Eukaryota Opisthokonta          Fungi Basidiomycota        100          5          4          2
#2 Eukaryota    Alveolata     Ciliophora  Spirotrichea        100        100        100        100
#3 Eukaryota Opisthokonta          Fungi Basidiomycota        100         80         60         50
#4 Eukaryota    Alveolata     Ciliophora  Spirotrichea         90         50         40         40
#5 Eukaryota    Alveolata Dinoflagellata   Dinophyceae        100         80         20          0
#                                 Name_updated
#1                                   Eukaryota
#2 Eukaryota;Alveolata;Ciliophora;Spirotrichea
#3  Eukaryota;Opisthokonta;Fungi;Basidiomycota
#4                         Eukaryota;Alveolata
#5                         Eukaryota;Alveolata

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...