从另一列按值拆分的字符串

问题描述

嗨,我有这个数据框 (DF1)

structure(list(Value = list("Peter","John",c("Patric","Harry")),Text = c("Hello Peter How are you","Is it John? Yes It is John,Harry","Hello Patric,how are you. Well,Harry thank you.")),class = "data.frame",row.names = c(NA,-3L)) 

             Value                                              Text
1            Peter                           Hello Peter How are you
2             John                 Is it John? Yes It is John,Harry
3 c(Patric,Harry) Hello Patric,Harry thank you.

而且我想按 Value 中的名称拆分 Text 中的句子以获得此

             Value                                              Text   Split
1            Peter                           Hello Peter How are you  c("Hello","Peter How are you")
2             John                 Is it John? Yes It is John,Harry  c("Is it","John? Yes It is John,Harry")
3 c(Patric,Harry thank you   c("Hello","Patric,","Harry thank you")

我试过了

DF1 %>% mutate(Split = strsplit(as.character(Text),as.character(Value)))

但它不起作用

解决方法

数据

假设这是真正的结构:

df <- structure(list(Value = list("Peter","John",c("Patric","Harry")),Text = c("Hello Peter How are you","Is it John? Yes It is John,Harry","Hello Patric,how are you. Well,Harry thank you.")),class = "data.frame",row.names = c(NA,-3L)) 

第一个解决方案:double for 循环

您可以使用双 for 循环来解决您的问题。这可能是一种更具可读性的解决方案,也更易于调试。

library(stringr)

Split <- list()

for(i in seq_len(nrow(df))){
 
 text  <- df$Text[i]
 value <- df$Value[[i]]
 
 for(j in seq_along(value)){
  
  text2 <- str_split(text[length(text)],paste0("(?<=.)(?=",value[[j]],")"),n = 2)[[1]]
  text <- c(text[-length(text)],text2)
  
 }
 
 Split[[i]] <- text
 
}

df$Split <- Split

如果你打印 df,它看起来就像你有一个唯一的字符串,但事实并非如此。

df$Split
#> [[1]]
#> [1] "Hello "            "Peter How are you"
#> 
#> [[2]]
#> [1] "Is it "                      "John? Yes It is John,Harry"
#> 
#> [[3]]
#> [1] "Hello "                      "Patric," "Harry thank you."           
#> 

第二种解决方案:tidyverse 和递归 fn

自从您最初尝试使用 dplyr 函数以来,您也可以使用递归函数以这种方式编写它。此解决方案不使用 for 循环。

library(stringr)
library(purrr)
library(dplyr)

str_split_recursive <- function(string,pattern){
 
 string <- str_split(string[length(string)],pattern[1],n = 2)[[1]]
 pattern <- pattern[-1]
 if(length(pattern) > 0) string <- c(string[-length(string)],str_split_recursive(string,pattern))
 string
 
}

df <- df %>% 
 mutate(Split = map2(Text,Value,str_split_recursive))