R:如何在没有正则表达式的情况下使用 str_replace_all()

问题描述

我有一些文本数据,其中包含“[姓氏]”、“[女性姓名]”和“[男性姓名]”。例如,

c("I am [female name]. I am ten years old","My father is [male name][surname]","I went to school today") 

我希望删除它们进行分析并期望得到

"I am . I am ten years old","My father is ","I went to school today"

但是当我运行下面的代码时,它返回的内容被破坏了。我认为 str_replace_all 可能会将 [ ] 的模式识别为正则表达式,但我不完全确定原因。

> str_replace_all(c("I am [female name]. I am ten years old","I went to school today"),"[surname]",'')

[1] "I  [fl ]. I  t y old" "My fth i [l ][]"      "I wt to chool tody"  

有人知道怎么解决吗? 提前致谢

解决方法

使用stringi::str_replace_all

library(stringi)
data <- c("I am [female name]. I am ten years old","My father is [male name][surname]","I went to school today") 
remove_us <- c("[female name]","[male name]","[surname]")
stri_replace_all_fixed(data,remove_us,"",vectorize_all=FALSE)

结果

[1] "I am . I am ten years old" "My father is  "            "I went to school today"   

R proof

但是,gsub 更简单:

gsub('\\[[^][]*]','',data)

another R proof

--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  [^][]*                   any character except: ']','[' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  ]                        ']'