一个分析和绘制两个长度不同的数据帧的通用解决方案?

问题描述

能帮我吗?

我正在R中编写代码,以自动化对多个网络的空模型分析。首先,代码将多个TXT matrices读取到R中。其次,它为每个网络计算拓扑度量。第三,它使用空模型将每个网络随机化N次。第四,它为原始矩阵的所有随机版本计算相同的拓扑度量。

在第五个也是最后一个步骤中,我们的想法是将观察到的分数与随机分数的分布进行比较。首先,通过简单计数多少随机得分高于或低于观察到的得分,以估算P值。其次,通过绘制随机分数的分布作为密度并添加一条垂直线以显示观察到的分数。

以下是data frames的示例,需要进行分析:

networks <- paste("network",rep(1:3),sep = "")
randomizations <- seq(1:10)

observed.ex <- data.frame(network = networks,observed = runif(3,min = 0,max = 1))

randomized.ex <- data.frame(network = sort(rep(networks,10)),randomization = rep(randomizations,3),randomized = rnorm(length(networks)*
                                                   length(randomizations),mean = 0.5,sd = 0.1))

在最终分析的第一步中,代码通过执行简单的计数来估算 P值。如您所见,我需要为每个网络制作计算调用的副本:

randomized.network1 <- subset(randomized.ex,network == "network1")
sum(randomized.network1$randomized >= observed.ex$observed[1]) /
    length(randomized.network1$randomized)
sum(randomized.network1$randomized <= observed.ex$observed[1]) /
    length(randomized.network1$randomized)

randomized.network2 <- subset(randomized.ex,network == "network2")
sum(randomized.network2$randomized >= observed.ex$observed[2]) /
    length(randomized.network2$randomized)
sum(randomized.network2$randomized <= observed.ex$observed[2]) /
    length(randomized.network2$randomized)

randomized.network3 <- subset(randomized.ex,network == "network3")
sum(randomized.network3$randomized >= observed.ex$observed[3]) /
    length(randomized.network3$randomized)
sum(randomized.network3$randomized <= observed.ex$observed[3]) /
    length(randomized.network3$randomized)

在最终分析的第二步中,代码创建密度图。如您所见,我需要为每个网络制作垂直线路呼叫的副本:

ggplot(randomized.ex,aes(randomized)) +
    geom_density() +
    facet_grid(network~.) +
    geom_vline(data=filter(randomized.ex,network == "network1"),aes(xintercept = observed.ex$observed[1]),colour = "red") + 
    geom_vline(data=filter(randomized.ex,network == "network2"),aes(xintercept = observed.ex$observed[2]),network == "network3"),aes(xintercept = observed.ex$observed[3]),colour = "red") 

是否有一种方法使该最终分析更加笼统,因此无论开始时读取了多少个网络,它总是进行相同的计算和绘图?

非常感谢您!

解决方法

看起来可以将它整齐地包装在一个lapply循环中,该循环遍历每个文件。以下内容如何为您工作?您还可以传入文件名而不是文件数(当前为1:3),并在TXT矩阵中“读取”第一行。

library(dplyr) #For %>%,group_by,and summarize
output <- lapply(1:3,function(network_num){
  network <- paste0("network",network_num)
  n_randomizations <- 10
  observed.ex <- runif(1)
  randomized.ex <- rnorm(n_randomizations,mean = 0.5,sd = 0.1)

  return(data.frame(network=network,observed=observed.ex,randomized=randomized.ex))
}) %>% do.call(what = rbind)

output %>%
  group_by(network) %>%
  summarize(p_value=mean(observed>=randomized))

ggplot(output) +
  geom_density(aes(randomized)) +
  facet_grid(network~.) +
  geom_vline(aes(xintercept = observed),col="red")

plots