pyspark-UnsupportedOperationException:空集合

问题描述

下面是我用来训练GBM模型以使用MLlib进行回归的代码。 在我的数据中,没有分类变量,所有字符串列均已预先标签编码为整数值。

与文档示例几乎相同,但无法运行并出现以下错误,请遵循link

火花版本:2.5

from pyspark.ml import Pipeline
from pyspark.ml.regression import GBTRegressor
from pyspark.ml.feature import VectorIndexer
from pyspark.ml.feature import VectorAssembler,VectorIndexer
from pyspark.ml.regression import GBTRegressor
from pyspark.ml.classification import GBTClassifier
from pyspark.ml.tuning import CrossValidator,ParamGridBuilder
from pyspark.ml.evaluation import RegressionEvaluator,BinaryClassificationEvaluator
from pyspark.ml import Pipeline

data = data.na.fill(-666)

# Train/Test Split
(X_train,X_test) = data.randomSplit([0.7,0.3])

vectorAssembler = VectorAssembler(inputCols=features,outputCol="rawFeatures")
vectorIndexer = VectorIndexer(inputCol="rawFeatures",outputCol="features",maxCategories=3)

target_var = 'class'
gbt = GBTRegressor(labelCol=target_var)

paramGrid = ParamGridBuilder()\
  .addGrid(gbt.maxDepth,[6])\
  .addGrid(gbt.maxIter,[10])\
  .build()

# We define an evaluation metric.  
evaluator = RegressionEvaluator(metricName="mae",labelCol=gbt.getLabelCol(),predictionCol=gbt.getPredictionCol())

# CV class
cv = CrossValidator(estimator=gbt,evaluator=evaluator,estimatorParamMaps=paramGrid)

# pipeline
pipeline = Pipeline(stages=[vectorAssembler,vectorIndexer,cv])

# trains the model
pipelineModel = pipeline.fit(X_train)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)