我如何旋转更宽并按两列计算一对的出现?

问题描述

在这里查看数据框

dt <- structure(list(ID = c(1,1,2,3,4,5,6,7,7),V1 = c("ABC","ABC","DEF","GHI","JKL","MNO","ABC"),V2 = c("DEF","JKL"
)),row.names = c(NA,-20L),class = c("data.table","data.frame"))

例如在 V1 列中,ABC 出现了 5 次,而在 V2 中,DEF 也出现了 5 次。然而,他们配对了三倍。我想创建一个计数列,无论它们属于哪个列(V1 或 V2),都会对它们进行计数。

解决方法

更新

dt[,c(2,3,1)] %>%
    graph_from_data_frame(directed = FALSE) %>%
    get.adjacency(type = "upper") %>%
    graph_from_adjacency_matrix(weighted = TRUE) %>%
    get.data.frame() %>%
    setNames(c("V1","V2","cnts")) %>%
    inner_join(dt) %>%
    arrange(ID) %>%
    select(ID,everything())

给予

   ID  V1  V2 cnts
1   1 ABC DEF    3
2   1 ABC MNO    2
3   1 DEF MNO    2
4   2 GHI JKL    1
5   3 ABC JKL    3
6   5 ABC DEF    3
7   6 DEF GHI    3
8   6 DEF MNO    2
9   6 GHI MNO    1
10  7 ABC JKL    3

我想你可以试试下面的 igraph 选项

library(igraph)
get.adjacency(
    graph_from_data_frame(dt[,-"ID"],directed = FALSE
    ),sparse = FALSE
)

给出

    ABC DEF GHI JKL MNO
ABC   0   3   2   3   2
DEF   3   0   3   1   2
GHI   2   3   0   1   1
JKL   3   1   1   0   2
MNO   2   2   1   2   0

如果你想添加一列指示计数,你可以试试

transform(
    dt,cnts = ave(ID,pmin(V1,V2),pmax(V1,FUN = length)
)

给出

   ID  V1  V2 cnts
 1:  1 ABC DEF    3
 2:  1 ABC MNO    2
 3:  1 DEF MNO    2
 4:  2 GHI JKL    1
 5:  2 GHI DEF    3
 6:  2 GHI ABC    2
 7:  2 JKL DEF    1
 8:  2 JKL ABC    3
 9:  2 DEF ABC    3
10:  3 ABC JKL    3
11:  4 MNO JKL    2
12:  5 GHI ABC    2
13:  5 GHI DEF    3
14:  5 ABC DEF    3
15:  6 DEF GHI    3
16:  6 DEF MNO    2
17:  6 GHI MNO    1
18:  7 MNO ABC    2
19:  7 MNO JKL    2
20:  7 ABC JKL    3
,

在基础 R 中,您可以执行以下操作:

codes = unique(c(dt$V1,dt$V2))
output = data.frame(code1 = character(0),code2 = character(0),occurances = integer(0))
for(i1 in 1:length(codes)){
    code1 = codes[i1]
    for(i2 in 1:i1){
        code2 = codes[i2]
        count = sum((code1==dt$V1 & code2 == dt$V2) | (code1==dt$V2 & code2 == dt$V1))
        output = rbind(output,data.frame(code1=code1,code2=code2,occurances=count))
    }
}
output

输出:

   code1 code2 occurances
1    ABC   ABC          0
2    DEF   ABC          3
3    DEF   DEF          0
4    GHI   ABC          2
5    GHI   DEF          3
6    GHI   GHI          0
7    JKL   ABC          3
8    JKL   DEF          1
9    JKL   GHI          1
10   JKL   JKL          0
11   MNO   ABC          2
12   MNO   DEF          2
13   MNO   GHI          1
14   MNO   JKL          2
15   MNO   MNO          0
,

您也可以使用 table() 并将结果与​​其转置相加:

tbl <- table(dt[-1]) 

(cnts <- tbl + `diag<-`(t(tbl),0))

     V2
V1    ABC DEF GHI JKL MNO
  ABC   0   3   2   3   2
  DEF   3   0   3   1   2
  GHI   2   3   0   1   1
  JKL   3   1   1   0   2
  MNO   2   2   1   2   0

去重并转换为数据框:

cnts[upper.tri(cnts)] <- NA

subset(as.data.frame.table(cnts),!is.na(Freq))

    V1  V2 Freq
1  ABC ABC    0
2  DEF ABC    3
3  GHI ABC    2
4  JKL ABC    3
5  MNO ABC    2
7  DEF DEF    0
8  GHI DEF    3
9  JKL DEF    1
10 MNO DEF    2
13 GHI GHI    0
14 JKL GHI    1
15 MNO GHI    1
19 JKL JKL    0
20 MNO JKL    2
25 MNO MNO    0