如何使用R提取方括号之外的字符串?

问题描述

如何为方括号外的文本提取字符串?

我的示例数据:

test <- structure(list(Site = c("DavidsonSimpson","DavidsonSimpson"),Measurement = c("Depth From Measuring Point [Manual Water Level]","HB Datum minus Depth From MP [Manual Water Level]")),row.names = c(NA,-2L),class = "data.frame")

提取括号内的字符串

test1 <- test %>% # all sites with datum "Land surface"
  mutate(Source = str_extract(Measurement,"(?<=\\[)[^]]+"))

但是如何将字符串提取到括号之外?

解决方法

您可以使用 {unglue}

library(unglue)

unglue_unnest(test,Measurement,"{Source} [{}]",remove = FALSE)
#>              Site                                       Measurement
#> 1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#>                         Source
#> 1   Depth From Measuring Point
#> 2 HB Datum minus Depth From MP

如果您希望同时保留两者:

unglue_unnest(test,"{Source1} [{Source2}]",remove = FALSE)
#>              Site                                       Measurement
#> 1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#>                        Source1            Source2
#> 1   Depth From Measuring Point Manual Water Level
#> 2 HB Datum minus Depth From MP Manual Water Level
,

我们可以使用

test %>%
   dplyr::mutate(Source = str_extract(Measurement,'[^\\[]+'))
#    Site                                       Measurement                        Source
#1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]   Depth From Measuring Point 
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP 
,

您可以使用str_extractstr_remove中使用的正则表达式删除括号内的单词,也删除括号。

library(dplyr)
library(stringr)

test %>% 
  mutate(Source = str_remove(Measurement,"\\[[^]]+\\]"))

#             Site                                       Measurement                        Source
#1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]   Depth From Measuring Point 
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP 

在基数R中,您可以使用sub

test$Source <- sub('\\s\\[.*\\]','',test$Measurement)
#For this case this works as well
#test$Source <- sub('\\s\\[.*',test$Measurement)