如何在R中使用lapply绘制因子变量的条形图?

问题描述

我编写了一个为每个因子变量创建geom_bar的函数。条形图显示每个类别的中位数价格。

library(ggplot2)
library(forcats)
bar_price<-function(var,color=mycol){
  ggplot(train,aes(fct_reorder(var,SalePrice,.desc = TRUE),fill=var))+
      stat_summary(aes(y = SalePrice),fun = "median",geom = "bar")+
      geom_hline(yintercept = median(train$SalePrice),color="red")+
      scale_fill_manual(values = rep(color,15))+
      geom_label(stat = "count",aes(label = ..count..,y = ..count..),fill="white")+
      ylab("SalePrice")+
      xlab(paste(substitute(var))[3])+
      theme_bw()+
      theme(legend.position = "none")
} 
bar_price(train$MSSubClass,"#202040")

enter image description here

功能bar_price运作良好。但是,如果我尝试套用因子变量,则会出现错误错误:fct_reorder(var,SalePrice,.desc = TRUE):长度(f)==长度(.x)真

factors <- sapply(train,function(x) is.factor(x))
factors_only<- train[,factors]
temp <- lapply(names(factors_only),bar_price)
print(temp[[1]])

这是我的数据集https://drive.google.com/file/d/1el-gAgA93EbYnM6VnDqzhT5c5uWsnKvq/view?usp=sharing

如何解决此问题?

解决方法

运行此行时:

bar_price(train$MSSubClass,"#202040")

请注意,train$MSSubClass是列的值而不是名称。

在您的lapply命令中,您正在将列名传递给bar_price函数。

temp <- lapply(names(factors_only),bar_price)

相反,您也应该在此处传递列值。另外,您没有传递color参数。

所以尝试:

temp <- lapply(factors_only,bar_price,"#202040")

要在x轴上获取正确的列名,建议将函数稍微更改为:

library(forcats)
library(ggplot2)
library(rlang)

bar_price <- function(data,var,color=mycol){
    ggplot(data,aes(fct_reorder(!!sym(var),SalePrice,.desc = TRUE),fill = !!sym(var))) +
      stat_summary(aes(y = SalePrice),fun = "median",geom = "bar")+
      geom_hline(yintercept = median(data$SalePrice),color="red")+
      scale_fill_manual(values = rep(color,15))+
      geom_label(stat = "count",aes(label = ..count..,y = ..count..),fill="white")+
      ylab("SalePrice")+
      xlab(var)+
      theme_bw()+
      theme(legend.position = "none")
} 

然后您可以按以下方式运行它:

temp <- lapply(names(factors_only),data = train,"#202040")