问题描述
下面是一个已解决的线程:matching strings regex exact match(感谢@Onyambu的更新代码)。
我需要完全匹配字符串-即使有特殊字符。
注意-抱歉,这是此问题上的第三个问题。我快到了,但是现在我不知道该如何处理特殊字符,而且我仍然在处理r中的字符串方面仍处于高水平。
更新了清晰度:
我有一个这样的匹配词/字符串表:
codes <- structure(
list(
column1 = structure(
c(2L,3L,NA),.Label = c("","4+","4 +"),class = "factor"
),column2 = structure(
c(1L,2L),.Label = c("old","the money","work"),column3 = structure(
c(3L,2L,"wonderyears","woke"),class = "factor"
)
),row.names = c(NA,-3L),class = "data.frame"
)
还有一个包含一列字符串的数据集。 我想查看字符串中的每个记录中是否包含任何代码:
strings<- structure(
list(
SurveyID = structure(
1:4,.Label = c("ID_1","ID_2","ID_3","ID_4"),Open_comments = structure(
c(2L,4L,1L),.Label = c(
"I need to pick up some apples","The system works","Flag only if there is a 4 with a plus","Show me the money"
),class = "data.frame",-4L)
)
我当前正在使用以下代码将代码与字符串匹配:
strings[names(codes)] <- lapply(codes,function(x)
+(grepl(paste0("\\b",na.omit(x),"\\b",collapse = "|"),strings$Open_comments)))
输出:
SurveyID Open_comments column1 column2 column3
1 ID_1 The system works 0 0 0
2 ID_2 Show me the money 0 1 0
3 ID_3 Flag only if there is a 4 with a plus 1 0 0
4 ID_4 I need to pick up some apples 0 0 0
问题-第3行ID_3 我只想在字符串包含“ 4+”或“ 4 +”的情况下进行标记,但是无论如何都将其标记出来。 反正有确切的捕捉吗?
解决方法
我们可以对+
进行转义以对其进行字面评估
+(grepl(paste0( "(",gsub("\\+","\\\\+",na.omit(codes$column1)),")",collapse="|"),strings$Open_comments))
#[1] 0 0 0 0
如果我们使用带有4+
的字符串,它将拾取
+(grepl(paste0( "(","Flag only if there is a 4+ with a plus"))
#[1] 1
对于多列
sapply(codes,function(x)+(grepl(paste0( "\\b(",na.omit(x)),")\\b",strings$Open_comments)))
# column1 column2 column3
#[1,] 0 0 0
#[2,] 0 1 0
#[3,] 0 0 0
#[4,] 0 0 0