检查R中的列表中是否存在数据框列值 数据

问题描述

我有一个色彩大师作为下面的列表

master <- list("Beige" = c("light brown","light golden","skin"),"off-white" = c("off white","cream","light cream","dirty white"),"Metallic" = c("steel","silver"),"Multi-colored" = c("multi color","mixed colors","mix","rainbow"),"Purple" = c("lavender","grape","jam","raisin","plum","magenta"),"Red" = c("cranberry","strawberry","raspberry","dark cherry","cherry","rosered"),"Turquoise" = c("aqua marine","jade green"),"Yellow" = c("fresh lime")
                     )

这是我拥有的datframe列

df$color <- c('multi color','purple','steel','metallic','off white','raisin','strawberry','magenta','skin','Beige','Jade Green','cream','multi-colored','offwhite','rosered',"light cream")

现在我要检查column中存在的值是否与list key相同或与list values相同

ex
1)如果df列的值首先为off white,则应查看列表键(如果存在)为Beige,off-white,Metallic...而不是获取值
2)如果其中一个键值是light cream,它还应该查看这些键具有的所有值,而不是应将其视为off-white
3)没有大小写敏感的问题,例如OffWhITe == offwhite或空格问题,例如off white==offwhite

输出
这应该是预期的输出

df$output <- c("Multi-colored","Purple","Metallic","off-white","Red","Beige","Turquoise","Multi-colored","off-white")

编辑
c("multi color","rainbow","multicolored","MultI-cOlored","multi-colored","MultiColORed","Multi-colored")中的任何值都应视为Multi-colored

解决方法

也许在string_dist_joinstack变成单个数据之后,我们可以进行list

library(dplyr)
library(fuzzyjoin)
library(tibble)
enframe(master,value = 'color') %>%
      unnest(c(color)) %>% 
      type.convert(as.is = TRUE) %>% 
      stringdist_right_join(df %>%
             mutate(rn = row_number()),max_dist = 3) %>% 
      transmute(color = color.y,output = coalesce(name,color.y))
# A tibble: 19 x 2
#   color         output       
#   <chr>         <chr>        
# 1 multi color   Multi-colored
# 2 purple        purple       
# 3 steel         Metallic     
# 4 metallic      metallic     
# 5 off white     off-white    
# 6 raisin        Purple       
# 7 strawberry    Red          
# 8 strawberry    Red          
# 9 magenta       Purple       
#10 skin          Beige        
#11 skin          Multi-colored
#12 Beige         Beige        
#13 Jade Green    Turquoise    
#14 cream         off-white    
#15 cream         Purple       
#16 multi-colored Multi-colored
#17 offwhite      off-white    
#18 rosered       Red          
#19 light cream   off-white    

数据

df <- structure(list(color = c("multi color","purple","steel","metallic","off white","raisin","strawberry","magenta","skin","Beige","Jade Green","cream","multi-colored","offwhite","rosered","light cream")),class = "data.frame",row.names = c(NA,-16L
))

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...