匹配两个具有多列的数据帧,并在匹配后添加一列

问题描述

我有两个数据框:

df1 <- data.frame(assoc = c(2,3.4,4.6,-2.3,-1,0.48,-0.4),con = c("A","B","C","D","E","F","T"))
df2 <- data.frame(pos = c("-3","-2","-1","0","1","2","3"),col1 = c("A","T","E"),col2 = c("B","A","F"))
view(df1)

con    assoc 
 A     2  
 B     3.4
 C     4.6
 D    -2.3  
 E    -1
 F     0.48
 T    -0.4

我想创建一个函数来匹配数据帧,以便从df1分配的值将在df2上显示为新列。所需的输出如下所示:

    pos   col1  con1  col2    con2 
    -3     A      2     B      3.4
    -2     B      3.4   T     -0.4
    -1     B      3.4   D     -2.3
     0     T     -0.4   A      2
     1     T     -0.4   E     -1 
     2     D     -2.3   C      4.6
     3     E     -1     F      0.48

我尝试使用:

res <- merge(df1,df2)
view(res)

不幸的是,它仅适用于一个示例。当我添加新列时,它似乎不起作用。

任何帮助将不胜感激!

解决方法

您可以在两列(即

)上使用match
sapply(df2[-1],function(i)df1$assoc[match(i,df1$con)])

     col1  col2
[1,]  2.0  3.40
[2,]  3.4 -0.40
[3,]  3.4 -2.30
[4,] -0.4  2.00
[5,] -0.4 -1.00
[6,] -2.3  4.60
[7,] -1.0  0.48
,

您似乎想通过不同的变量两次将df1连接到df2上:

library(tidyr)

df1 %>%
  left_join(df1,by = c(col1 = "con")) %>% 
  left_join(df1,by = c(col2 = "con"))

#>   pos col1 col2 assoc.x assoc.y
#> 1  -3    A    B     2.0    3.40
#> 2  -2    B    T     3.4   -0.40
#> 3  -1    B    D     3.4   -2.30
#> 4   0    T    A    -0.4    2.00
#> 5   1    T    E    -0.4   -1.00
#> 6   2    D    C    -2.3    4.60
#> 7   3    E    F    -1.0    0.48

或双重合并:

merge(merge(df1,df2,by.x = "con",by.y = "col1"),by.y = "col2")
#>   con assoc pos.x col2 pos.y col1
#> 1   A   2.0    -3    B     0    T
#> 2   B   3.4    -2    T    -3    A
#> 3   B   3.4    -1    D    -3    A
#> 4   D  -2.3     2    C    -1    B
#> 5   E  -1.0     3    F     1    T
#> 6   T  -0.4     0    A    -2    B
#> 7   T  -0.4     1    E    -2    B
,

您是说merge吗?

Reduce(
  function(x,y) merge(x,y,by = names(df1)),lapply(
    grep("col",names(df2),value = TRUE),function(y) merge(df1,by.y = y)
  )
)

如果您在col1中不仅有col2df2,就可以合并,给您

  assoc con pos.x col2 pos.y col1
1  -0.4   T     0    A    -2    B
2  -0.4   T     1    E    -2    B
3  -1.0   E     3    F     1    T
4  -2.3   D     2    C    -1    B
5   2.0   A    -3    B     0    T
6   3.4   B    -2    T    -3    A
7   3.4   B    -1    D    -3    A
,

使用for循环

out <- df2
for(cn in c("col1","col2")) out <- merge(out,df1,by.x = cn,by.y = 'con',all.x = TRUE)