未设置线性回归特征

问题描述

我正在尝试编写一些线性回归来分析我的数据。所以我使用 Scala,我基本上是这样做的

import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.{Pipeline,PipelineModel}

val training_data_finalised = training.drop("COUNTRY_REGION","PROVINCE_STATE","DATE")
val featuresArray = Array("Active","Confirmed","Deaths","Recovered","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC","AVG_PARKS_CHANGE_PERC","AVG_RESIDENTIAL_CHANGE_PERC","AVG_RETAIL_AND_RECREATION_CHANGE_PERC","AVG_TRANSIT_STATIONS_CHANGE_PERC","AVG_WORKPLACES_CHANGE_PERC","Active_1_day","Active_2_day","Active_7_day","Active_14_day","Confirmed_1_day","Confirmed_2_day","Confirmed_7_day","Confirmed_14_day","Deaths_1_day","Deaths_2_day","Deaths_7_day","Deaths_14_day","Recovered_1_day","Recovered_2_day","Recovered_7_day","Recovered_14_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_1_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_2_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_7_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_14_day","AVG_PARKS_CHANGE_PERC_1_day","AVG_PARKS_CHANGE_PERC_2_day","AVG_PARKS_CHANGE_PERC_7_day","AVG_PARKS_CHANGE_PERC_14_day","AVG_RESIDENTIAL_CHANGE_PERC_1_day","AVG_RESIDENTIAL_CHANGE_PERC_2_day","AVG_RESIDENTIAL_CHANGE_PERC_7_day","AVG_RESIDENTIAL_CHANGE_PERC_14_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_1_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_2_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_7_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_14_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_1_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_2_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_7_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_14_day","AVG_WORKPLACES_CHANGE_PERC_1_day","AVG_WORKPLACES_CHANGE_PERC_2_day","AVG_WORKPLACES_CHANGE_PERC_7_day","AVG_WORKPLACES_CHANGE_PERC_14_day")

val assembler = new VectorAssembler()
  .setInputCols(featuresArray)
  .setoutputCol("features")

val lr = new LinearRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)
  .setFeaturesCol("features")   // setting features column
  .setLabelCol("Deaths")       // setting label column

val pipeline = new Pipeline().setStages(Array(assembler,lr))

//fitting the model
val lrModel = pipeline.fit(training_data_finalised.na.fill(0))

但是我如何获得系数值?

有什么建议吗?

补充一点,我尝试根据 spark 文档 (https://spark.apache.org/docs/latest/ml-classification-regression.html) 执行此操作

val lr = new LinearRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)

// Fit the model
val lrModel = lr.fit(training)

// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

但出于某种原因,这给了我一个

IllegalArgumentException: features does not exist. Available: Active,Confirmed,Deaths

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)