问题描述
我正在尝试编写一些线性回归来分析我的数据。所以我使用 Scala,我基本上是这样做的
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.{Pipeline,PipelineModel}
val training_data_finalised = training.drop("COUNTRY_REGION","PROVINCE_STATE","DATE")
val featuresArray = Array("Active","Confirmed","Deaths","Recovered","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC","AVG_PARKS_CHANGE_PERC","AVG_RESIDENTIAL_CHANGE_PERC","AVG_RETAIL_AND_RECREATION_CHANGE_PERC","AVG_TRANSIT_STATIONS_CHANGE_PERC","AVG_WORKPLACES_CHANGE_PERC","Active_1_day","Active_2_day","Active_7_day","Active_14_day","Confirmed_1_day","Confirmed_2_day","Confirmed_7_day","Confirmed_14_day","Deaths_1_day","Deaths_2_day","Deaths_7_day","Deaths_14_day","Recovered_1_day","Recovered_2_day","Recovered_7_day","Recovered_14_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_1_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_2_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_7_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_14_day","AVG_PARKS_CHANGE_PERC_1_day","AVG_PARKS_CHANGE_PERC_2_day","AVG_PARKS_CHANGE_PERC_7_day","AVG_PARKS_CHANGE_PERC_14_day","AVG_RESIDENTIAL_CHANGE_PERC_1_day","AVG_RESIDENTIAL_CHANGE_PERC_2_day","AVG_RESIDENTIAL_CHANGE_PERC_7_day","AVG_RESIDENTIAL_CHANGE_PERC_14_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_1_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_2_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_7_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_14_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_1_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_2_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_7_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_14_day","AVG_WORKPLACES_CHANGE_PERC_1_day","AVG_WORKPLACES_CHANGE_PERC_2_day","AVG_WORKPLACES_CHANGE_PERC_7_day","AVG_WORKPLACES_CHANGE_PERC_14_day")
val assembler = new VectorAssembler()
.setInputCols(featuresArray)
.setoutputCol("features")
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFeaturesCol("features") // setting features column
.setLabelCol("Deaths") // setting label column
val pipeline = new Pipeline().setStages(Array(assembler,lr))
//fitting the model
val lrModel = pipeline.fit(training_data_finalised.na.fill(0))
但是我如何获得系数值?
有什么建议吗?
补充一点,我尝试根据 spark 文档 (https://spark.apache.org/docs/latest/ml-classification-regression.html) 执行此操作
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
但出于某种原因,这给了我一个
IllegalArgumentException: features does not exist. Available: Active,Confirmed,Deaths
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)