问题描述
如何为方括号外的文本提取字符串?
我的示例数据:
test <- structure(list(Site = c("DavidsonSimpson","DavidsonSimpson"),Measurement = c("Depth From Measuring Point [Manual Water Level]","HB Datum minus Depth From MP [Manual Water Level]")),row.names = c(NA,-2L),class = "data.frame")
提取括号内的字符串
test1 <- test %>% # all sites with datum "Land surface"
mutate(Source = str_extract(Measurement,"(?<=\\[)[^]]+"))
但是如何将字符串提取到括号之外?
解决方法
您可以使用 {unglue} :
library(unglue)
unglue_unnest(test,Measurement,"{Source} [{}]",remove = FALSE)
#> Site Measurement
#> 1 DavidsonSimpson Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#> Source
#> 1 Depth From Measuring Point
#> 2 HB Datum minus Depth From MP
如果您希望同时保留两者:
unglue_unnest(test,"{Source1} [{Source2}]",remove = FALSE)
#> Site Measurement
#> 1 DavidsonSimpson Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#> Source1 Source2
#> 1 Depth From Measuring Point Manual Water Level
#> 2 HB Datum minus Depth From MP Manual Water Level
,
我们可以使用
test %>%
dplyr::mutate(Source = str_extract(Measurement,'[^\\[]+'))
# Site Measurement Source
#1 DavidsonSimpson Depth From Measuring Point [Manual Water Level] Depth From Measuring Point
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP
,
您可以使用str_extract
中str_remove
中使用的正则表达式删除括号内的单词,也删除括号。
library(dplyr)
library(stringr)
test %>%
mutate(Source = str_remove(Measurement,"\\[[^]]+\\]"))
# Site Measurement Source
#1 DavidsonSimpson Depth From Measuring Point [Manual Water Level] Depth From Measuring Point
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP
在基数R中,您可以使用sub
:
test$Source <- sub('\\s\\[.*\\]','',test$Measurement)
#For this case this works as well
#test$Source <- sub('\\s\\[.*',test$Measurement)