问题描述
我有一个包含数百个样本和数千个变量的数据框。但是这里我给出了一个简单的数据框(my_data)作为说明。我想根据基因所属的簇(基因可以在多个簇中)获取变量相对于基因计数总和的百分比。我知道如何使用数据框操作获取每个百分比。但是,由于我是编码新手,我正在尝试制作一个函数来获取百分比。任何人都可以帮助如何使用我的函数(百分比)获得每个基因的百分比?结果将是“百分比”列上的百分比。非常感谢。
gene = c("CD63","PTN","MT2A","PTGDS","DBI","TIMP1","COX6C","APLP2","GPC1")
gene_count = c(10,15,5,10,25,5)
cluster = c(1,2,3,7,8,9,6,4 )
percent = c(0.1,0.15,0.5,0.1,0.25,0.05,0.05)
my_data = data.frame(gene,gene_count,cluster,percent)
my_data
percent = function(gene,cluster){
for (gene in c(data$gene)){
if (data$gene == gene & data$cluster == cluster)
print(data$gene_count[which(data$gene == gene & data$cluster == cluster)]/sum(data$gene_count))
else print("Gene is not expressed in this cluster")
}
}
解决方法
您可以通过以下方式编写函数:
percent <- function(data,my_gene,my_cluster) {
sub_data <- subset(data,gene == my_gene & cluster == my_cluster)
if(nrow(sub_data)) sum(sub_data$gene_count)/sum(data$gene_count)
else cat("Gene is not expressed in this cluster")
}
percent(my_data,'PTN',2)
#[1] 0.15
percent(my_data,'ABC',2)
#Gene is not expressed in this cluster
,
这可能有用。
percent_in_total
percent_in_total = function(data,gene_in,cluster_in){
data %>%
filter(gene == gene_in & cluster == cluster_in) %>%
.[["gene_count"]]/sum(gene_count)
}
percent_in_total(my_data,"PTN",2)
[1] 0.15
# data.table version
library(data.table)
percent_in_total = function(data,cluster_in){
setDT(data)[,.SD[gene == gene_in & cluster == cluster_in,gene_count]] / sum(gene_count)
}
percent_in_total(my_data,2)
[1] 0.15
percent_in_cluster
我更喜欢使用 data.table
的语法。为了解释这个过程,对于基因MT2A
,.SD[gene == gene_in & cluster == cluster_in,gene_count]
是gene_count = 5
的数量,sum(.SD[cluster==cluster_in,gene_count])]
是cluster = 3
的总基因数量,即{{1 }}。
5+5