R：“参数长度为 0”空图

问题描述

我使用的是 R 编程语言。我正在尝试按照此处的教程进行操作：https://cran.r-project.org/web/packages/lime/vignettes/Understanding_lime.html

我尝试创建自己的数据来复制本教程：

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2","4")
var_3 <- sample(var_3,100,replace=TRUE,prob=c(0.3,0.6,0.1))

response<- c("1","0")
response <- sample(response,0.7))

#put them into a data frame called "f"
f <- data.frame(var_1,var_2,var_3,response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~.,data = f[-1,],mtry=2,ntree=100)
model<-as_classifier(model,labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,model,bin_continuous = TRUE,quantile_bins = FALSE)
explanation <- explain(f[-1,explainer,n_labels = 1,n_features = 4)
    
#visualize the results - here is the error:
plot_features(explanation,ncol = 1)

Error in if (nrow(explanation) == 0) stop("No explanations to plot",call. = FALSE) : 
  argument is of length zero

有人可以告诉我我做错了什么吗？是不是因为这个过程不是为了在一次观察中运行？

谢谢

更新：如果我更改这行代码：

model<-randomForest(response ~.,ntree=100)

到

model<-randomForest(response ~.,data = f,ntree=100)

代码现在似乎可以运行（这不是什么大问题，我可以在运行这一步之前编写 f = f[-1,] 和 f_new = f[1,]），但是视觉图没有完全显示出来。这是我的图形控制台的问题吗？（注意：来自网站的教程可以完美运行）

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  Grdevices utils     datasets  methods   base     

other attached packages:
[1] randomForest_4.6-14 lime_0.5.1          MASS_7.3-53        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           lubridate_1.7.9      lattice_0.20-41      class_7.3-17         assertthat_0.2.1    
 [6] glmnet_4.0-2         digest_0.6.25        ipred_0.9-9          foreach_1.5.1        mime_0.9            
[11] R6_2.4.1             plyr_1.8.6           stats4_4.0.2         ggplot2_3.3.2        pillar_1.4.6        
[16] rlang_0.4.7          caret_6.0-86         rstudioapi_0.11      data.table_1.12.8    rpart_4.1-15        
[21] Matrix_1.2-18        shinythemes_1.1.2    labeling_0.3         splines_4.0.2        gower_0.2.2         
[26] stringr_1.4.0        htmlwidgets_1.5.2    munsell_0.5.0        tinytex_0.26         shiny_1.5.0         
[31] compiler_4.0.2       httpuv_1.5.4         xfun_0.15            pkgconfig_2.0.3      shape_1.4.5         
[36] htmltools_0.5.0      nnet_7.3-14          tidyselect_1.1.0     tibble_3.0.3         prodlim_2019.11.13  
[41] codetools_0.2-16     Crayon_1.3.4         dplyr_1.0.2          withr_2.3.0          later_1.1.0.1       
[46] recipes_0.1.13       ModelMetrics_1.2.2.2 grid_4.0.2           nlme_3.1-149         xtable_1.8-4        
[51] gtable_0.3.0         lifecycle_0.2.0      magrittr_1.5         pROC_1.16.2          scales_1.1.1        
[56] stringi_1.4.6        farver_2.0.3         reshape2_1.4.4       promises_1.1.1       timeDate_3043.102   
[61] ellipsis_0.3.1       generics_0.0.2       vctrs_0.3.2          xgboost_1.1.1.1      lava_1.6.8          
[66] iterators_1.0.13     tools_4.0.2          glue_1.4.1           purrr_0.3.4          fastmap_1.0.1

解决方法

我可能已经让它工作了。根据我使用的原始代码，这里是情节：

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2","4")
var_3 <- sample(var_3,100,replace=TRUE,prob=c(0.3,0.6,0.1))

response<- c("1","0")
response <- sample(response,0.7))

#put them into a data frame called "f"
f <- data.frame(var_1,var_2,var_3,response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~.,data = f,mtry=2,ntree=100)
model<-as_classifier(model,labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,],model,bin_continuous = TRUE,quantile_bins = FALSE)
explanation <- explain(f[-1,explainer,n_labels = 1,n_features = 4)

#visualize the results - here is the error:
plot_features(explanation,ncol = 1)

我更改了代码（见下文）：

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,case =1:4,ncol = 1)

我不明白发生了什么变化 - 但至少现在显示了图形。假设我只对第一次观察感兴趣。我仍然很困惑这些行是否应该是：

explainer <- lime(f[-1,n_features = 4)

或

explainer <- lime(f,quantile_bins = FALSE)
explanation <- explain(f,n_features = 4)

我也不确定“概率”和“解释契合度”之间有什么区别。我假设“概率”是随机森林模型产生的概率，而“解释拟合”衡量的是 LIME 模型的“解释能力”。

（如果有人知道这个，可以在下面评论吗？谢谢）

data-manipulation data-visualization machine-learning r r random-forest