用分号将两列分隔为行R

问题描述

所以，我有两列，每列中都有多个项目。它们之间用分号隔开。

我想将它们分成多行，新行根据原始行的顺序进行匹配。

如果我只有一列，我会使用separate_rows，但是我不知道如何处理需要匹配的两列。用一个例子更容易解释：

可复制的示例：

AU <- c("Ali,PB; Naylor,JC","Warren,EW; Stephens,D")
EM <- c("[email protected]; [email protected]","[email protected]; [email protected]")
question <- data.frame(AU,EM)

我希望数据框是这样：

1 Ali,PB [email protected]
2 Naylor,JC [email protected]
3 Warren,EW [email protected]
4 Stephens,D [email protected]

解决方法

tidyr包来解救！ separate_rows()是一个新的（？）函数，完全可以实现您想要的功能。

tidyr::separate_rows(question,AU,EM,sep = ";",convert = T)

如果您不想使用Ben的漂亮tidyverse公式，并且如果您的数据始终在同一位置匹配名称和电子邮件，则也可以使用for循环

Python 3.8.2+ (heads/3.8:686d508,Mar 26 2020,09:32:57) 
[Clang 11.0.3 (clang-1103.0.32.29)] on darwin
Type "help","copyright","credits" or "license" for more information.
>>> import re
>>> matrix = "[[13,2,99][-2,3,13][1,0][7,77,777]]"
>>> regex = re.compile(r"\[(-?[0-9]+,)+-?[0-9]+]")
>>> re.findall(regex,matrix)
['2,','3,'77,']
>>> regex = re.compile(r"\[(?:-?[0-9]+,matrix)
['[13,99]','[-2,13]','[1,0]','[7,777]']

或者如果您需要更快地使它矢量化：

AU <- c("Ali,PB; Naylor,JC","Warren,EW; Stephens,D")
EM <- c("[email protected]; [email protected]","[email protected]; [email protected]")
question <- data.frame(AU,stringsAsFactors = FALSE)

df <- data.frame(name=c(),email=c())
for(r in 1:nrow(question)){
  a <- strsplit(question[r,1],"; ")[[1]]
  e <- strsplit(question[r,2],"; ")[[1]]
  
  df <- rbind(df,data.frame(name=a,email=e))
}
df

另外，请注意要用“;”而不是“;”分隔，因为数据中每个单元格的第二部分都是由空格字符组成的

r r tidyr