处理从 str_extract_all 返回的字符0

问题描述

在数据整理期间，我想清理仅提取特定模式的列。

mytib <- tibble(
  a = c(1,2,3,4,5),b = c("aaa876",NA,"auy987 iuy876","alsdjkf a","1234 abc987"))

x <- mytib %>% 
  dplyr::mutate(b = stringr::str_extract_all(b,"[a-z]{3}[0-9]{3}")) %>% 
  unnest(b)

# results:
# A tibble: 5 x 2
#      a b     
#     <dbl> <chr> 
# 1     1 aaa876
# 2     2 NA    
# 3     3 auy987
# 4     3 iuy876
# 5     5 abc987

相反，我想得到：

# A tibble: 6 x 2
# a b     
#     <dbl> <chr> 
#  1     1 aaa876
#  2     2 NA    
#  3     3 auy987
#  4     3 iuy876
#  5     4 NA    
#  6     5 abc987

似乎是因为行 (4,"alsdjkf") 不符合模式，str_stract_all 返回了 "character(0)" 并且 unnest 函数被撕掉了以最终结果为准。

有谁知道我怎样才能以任何其他方式获得所需的结果，或者我怎样才能处理“字符（0）”这样 unnest 会保留 a == 4 的行？

解决方法

只需在 keep_empty = TRUE 中添加 unnest：

x <- mytib %>% 
  dplyr::mutate(b = stringr::str_extract_all(b,"[a-z]{3}[0-9]{3}")) %>% 
  unnest(b,keep_empty = TRUE)

x
#> # A tibble: 6 x 2
#>       a b     
#>   <dbl> <chr> 
#> 1     1 aaa876
#> 2     2 NA    
#> 3     3 auy987
#> 4     3 iuy876
#> 5     4 NA    
#> 6     5 abc987

r r stringr unnest