问题描述
我是R语言的新秀,我想基于一行中同时出现的元素创建一个共现矩阵。
理想结果的基本示例
说您有这张桌子:
df <- data.frame(ID = c(1,2,3),V1 = c("England","England","China"),V2 = c("Greece","Greece")
)
也就是说:
ID V1 V2
1: 1 England Greece
2: 2 England England
3: 3 China Greece
我的同现矩阵应如下所示:
Country China England Greece
China 0 0 1 #China & Greece co-occur in row 3
England 0 1 1 #England & England co-occur in row 2,and England and Greece in row 1
Greece 1 1 0
但是,如果我遵循此example,我会得到:
library(tidyverse)
df %>%
pivot_longer(-ID,names_to = "Category",values_to = "Country") %>%
xtabs(~ID + Country,data = .,sparse = FALSE) %>%
crossprod(.,.)
df_diag <- df %>%
pivot_longer(-ID,values_to = "Country") %>%
mutate(Country2 = Country) %>%
xtabs(~Country + Country2,sparse = FALSE) %>%
diag()
diag(df1) <- df_diag
df1
Country China England Greece
China 1 0 1
England 0 3 1
Greece 1 1 2
如果我跟随其他one,我将得到:
library(reshape2)
library(data.table)
melt(setDT(df),id.vars = "ID",measure = patterns("^V"))[nchar(value) > 0 & complete.cases(value)] -> foo
# Get distinct value (country) in each ID group (each row)
unique(foo,by = c("ID","value")) -> foo2
# https://stackoverflow.com/questions/13281303/creating-co-occurrence-matrix
# Seeing this question,you want to create a matrix with crossprod().
crossprod(table(foo2[,c(1,3)])) -> mymat
# Finally,you need to change diagonal values. If a value is equal to one,# change it to zero. Otherwise,keep the original value.
diag(mymat) <- ifelse(diag(mymat) <= 1,mymat)
mymat
value China England Greece
China 0 0 1
England 0 0 1
Greece 1 1 1
我想我缺少有关xtabs的工作方式或者也许还有跨脚架的东西?
解决方法
您需要的是:
在基数R中,您可以这样做:
a <- table(lapply(df[-1],factor,levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
a
V2
V1 China England Greece
China 0 0 1
England 0 1 1
Greece 1 1 0
其他库:
library(network)
as.matrix(network(df[-1],directed = FALSE))
China England Greece
China 0 0 1
England 0 1 1
Greece 1 1 0
library(igraph)
as.matrix(as_adj(graph_from_data_frame(df[-1],FALSE)))
England China Greece
England 1 0 1
China 0 0 1
Greece 1 1 0
,
您可以在底数R中使用outer
unique_vals <- sort(union(df$V1,df$V2))
co_mat <- function(x,y) +(any(df$V1 == x & df$V2 == y |
df$V2 == x & df$V1 == y))
mat <- outer(unique_vals,unique_vals,Vectorize(co_mat))
dimnames(mat) <- list(unique_vals,unique_vals)
mat
# China England Greece
#China 0 0 1
#England 0 1 1
#Greece 1 1 0