问题描述
我正在尝试计算数据框中所有列的平均值(平均值)。我已经创建了这个代码片段
#Average overallDataset by label
overallDatasetLabels <- c("label","index","nr_pix","rows_with_2","cols_with_2","rows_with_3p","cols_with_3p","height","width","left2tile","right2tile","verticalness","top2tile","bottom2tile","horizontalness","nodiagnols")
library(dplyr)
avgoverallDataset <- summarise(group_by(overallDataset,label),nr_pix_avg=mean(nr_pix))
for (val in overallDatasetLabels){
if (val %in% c("label","nr_pix")){
next
}
avgoverallDataset<-cbind(avgoverallDataset,summarise(group_by(overallDataset,val=mean(val)))
}
50: In mean.default(val) : argument is not numeric or logical: returning NA
生成的数据框如下所示:
这样做的原因是 val 变量被视为字符串,但我需要将其视为“代码”,例如
avgoverallDataset<- cbind(avgoverallDataset,avgrows_with_2=mean(rows_with_2)))
应该是有效的。
如何将“字符串”转换为“代码中”的值?
注意:可以使用以下方法删除多个标签列:How to remove duplicated column names in R?
解决方法
尝试使用 across
,这样您就不必为循环中的多列计算 mean
。
cols <- setdiff(overallDatasetLabels,c("label","index","nr_pix"))
avgOverallDataset <- overallDataset %>%
group_by(label) %>%
summarise(across(all_of(cols),mean,na.rm = TRUE))