问题描述
我的目标是对数据框的某些列求和并将其总和输入到新列中。
假设我有以下data.frame:
df <- data.frame(names=c("a","b","c","d","e","f"),wb01=c(1,1,0),wb02=c(0,1),wb03=c(0,wb04=c(1,wb05=c(1,wb06=c(1,1))
rownames(df) <- df$names
wb01 wb02 wb03 wb04 wb05 wb06
a 1 0 0 1 1 1
b 1 0 0 1 0 1
c 0 0 1 0 1 1
d 1 0 1 1 0 1
e 1 1 1 1 0 1
f 0 1 1 1 1 1
我想通过使用一个向量来选择要累加的列,该向量将包含要累加的列的名称。 (我的实际数据帧和我将选择的列数非常大,并且不会聚集在一起,即,/我不能只选择3-5列,也不希望输入每列,因为它将超过2k。 ..)
但是回到示例,这是我要总结的列:
genelist <- c(wb02,wb03,wb06)
所以结果看起来像这样:
wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
a 1 0 0 1 1 1 1
b 1 0 0 1 0 1 1
c 0 0 1 0 1 1 2
d 1 0 1 1 0 1 3
e 1 1 1 1 0 1 3
f 0 1 1 1 1 1 3
感谢您的帮助或提示!
解决方法
我们可以使用rowSums
df$sum_genelist <- rowSums(df[intersect(genelist,names(df))],na.rm = TRUE)
df
# names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#a a 1 0 0 1 1 1 1
#b b 1 0 0 1 0 1 1
#c c 0 0 1 0 1 1 2
#d d 1 0 1 1 0 1 2
#e e 1 1 1 1 0 1 3
#f f 0 1 1 1 1 1 3
其中
genelist <- c('wb02','wb03','wb06')
数据
df <- structure(list(names = c("a","b","c","d","e","f"),wb01 = c(1,1,0),wb02 = c(0,1),wb03 = c(0,wb04 = c(1,wb05 = c(1,wb06 = c(1,1)),row.names = c("a",class = "data.frame")
,
您只能使用any_of
到select
数据中存在的那些列。
genelist <- c('wb02','wb06','a')
library(dplyr)
df %>% mutate(sum_genelist = rowSums(select(.,any_of(genelist))))
# names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#1 a 1 0 0 1 1 1 1
#2 b 1 0 0 1 0 1 1
#3 c 0 0 1 0 1 1 2
#4 d 1 0 1 1 0 1 2
#5 e 1 1 1 1 0 1 3
#6 f 0 1 1 1 1 1 3