在R中，当存在较短版本时，请排除字符串中的字符

问题描述

数据是一个玩具示例。 have中的每个字符都有字母。如果较短的字符包含n-1个字母，我想排除字符。例如，因为我们有AB，所以排除了亚行。保留ADE是因为我们没有AD AE或DE。

have <- c('A,B','B,C','A,D,B,E',E')
want <- c('A,E')

我知道grepl可能会有用，但是我不确定如何以计算有效的方式来做到这一点。

解决方法

这里是分割字符串的一种方法。

#Split the string on comma
tmp <- strsplit(have,',')
#Iterate over the index of tmp
have[!sapply(seq_along(tmp),function(x) {
  one <- tmp[[x]]
  any(sapply(tmp[-x],function(y) sum(one %in% y)) >= (length(one) - 1) & 
      lengths(tmp[-x]) < length(one))
})]

#[1] "A,B"   "B,C"   "A,D,E"

sum(one %in% y)计算另一个字符串中当前字符串的字符数。
>= (length(one) - 1)确保n-1 of the letters条件。
lengths(tmp[-x]) < length(one)确保它更短。

grepl r r regex