正则表达式中r

问题描述

我正在尝试从我拥有的文档列表提取名称名称始终是最后一个一个模式的第一个出现。

我正在使用stringr尝试以下正则表达式,但是它不起作用^ [A-Z] [a-z] +,\ s [A-Z] [a-z] + $ 我相信这是因为正则表达式之前的模式在整个文档中并不恒定。请参见下面的示例。


library(stringr)
m = c("   name: aaaaaa,bbbbbb  age: 25","age 34   person: aaaa,bbbb"," location: A  name 
 aaaa,bbbbbbb","aaaaa,bbbb")

str_extract(m,"^[A-Z][a-z]+,\\s[A-Z][a-z]+$")

# I tried to add a white space before and after the beginning of the pattern 
# but still not working:

str_extract(m,"^\\s[A-Z][a-z]+,\\s[A-Z][a-z]+$\\s")


预期的输出名称列表: aaaaaa,bbbbbb aaaa,bbbb aaaa,bbbbbbb aaaaa,bbbb

感谢您的建议。

解决方法

您的正则表达式似乎不太正确。尝试改用[A-Za-z]+,\s[A-Za-z]+,看看是否有区别。语句开头的“胡萝卜”字符仅与模式在行开头处开始的字符串匹配,而美元符号的末尾仅与在行末处结束的模式匹配。我还将A-Za-z组合并为一个,并删除了不必要的反斜杠。将来用RegExr之类的方法测试您的正则表达式可能会有所帮助,这样可以减轻重复运行程序的痛苦。

,
# Base example
m = c("   name: aaaaaa,bbbbbb  age: 25","age 34   person: aaaa,bbbb"," location: A  name aaaa,bbbbbbb")


# This function implement the solution
extract_lastfirst <- function(x) {
  stopifnot(`"{stringr} is required"` = requireNamespace("stringr"))

  stringr::str_extract(x,"\\w+,\\w+") # This line solve the problem
}
extract_lastfirst(m)
#> [1] "aaaaaa,bbbbbb" "aaaa,bbbb"     "aaaa,bbbbbbb"

# In the text there is a mention to "the first occurrence of",so try
# the solution with an example the have a "second" occurrence.
n <- c("name: aa,bb wrong: cc,dd")
extract_lastfirst(n)
#> [1] "aa,bb"




# formal tests for the solution ---------------------------------------
# (no output means test passed)

library(testthat)

testthat::test_that("goal achieved",{
  expected_out_m <- c("aaaaaa,bbbbbb","aaaa,bbbbbbb")
  expect_equal(extract_lastfirst(m),expected_out_m)
})

testthat::test_that("multiple occurrences",{
  expected_out_n <- c("aa,bb")
  expect_equal(extract_lastfirst(n),expected_out_n)
})

reprex package(v0.3.0)于2020-09-02创建

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       Ubuntu 20.04.1 LTS          
#>  system   x86_64,linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Rome                 
#>  date     2020-09-02                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)
#>  backports     1.1.9   2020-08-24 [1] CRAN (R 4.0.2)
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 4.0.2)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.2)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.2)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.2)
#>  devtools      2.3.1   2020-07-21 [1] CRAN (R 4.0.2)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 4.0.2)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.2)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.2)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.2)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.2)
#>  knitr         1.29    2020-06-23 [1] CRAN (R 4.0.2)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.2)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.2)
#>  pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.2)
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.2)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.2)
#>  processx      3.4.3   2020-07-05 [1] CRAN (R 4.0.2)
#>  ps            1.3.4   2020-08-11 [1] CRAN (R 4.0.2)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.2)
#>  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)
#>  rlang         0.4.7   2020-07-09 [1] CRAN (R 4.0.2)
#>  rmarkdown     2.3     2020-06-18 [1] CRAN (R 4.0.2)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.2)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  testthat    * 2.3.2   2020-03-02 [1] CRAN (R 4.0.2)
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 4.0.2)
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 4.0.2)
#>  xfun          0.16    2020-07-24 [1] CRAN (R 4.0.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)
#> 
#> [1] /home/cl/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library