得到“:java.util.NoSuchElementException:Param approxQuantileRelativeError 不存在”错误

问题描述

我正在尝试在本地机器上通过 pyspark 运行 iforest。 (目前只是试图让示例运行以确保一切正常。

运行模型之前的所有代码似乎都没有问题,但实际运行模型时却出现错误

Py4JJavaError: An error occurred while calling o127.getParam.
: java.util.NoSuchElementException: Param approxQuantileRelativeError does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at scala.Option.getorElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
    at org.apache.spark.ml.Pipelinestage.getParam(Pipeline.scala:42)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

我在互联网上的其他任何地方都没有发现这个错误,所以我的配置可能有问题吗?

下面是代码,直到它给我错误

import py4j
print(dir(py4j))
from pyspark import SparkConf
from pyspark.sql import SparkSession,functions as F
from pyspark.ml.feature import VectorAssembler,StandardScaler
from pyspark_iforest.ml.iforest import IForest,IForestModel
import tempfile

conf = SparkConf()
conf.set('spark.jars','~/opt/anaconda3/envs/spark/lib/python3.6/site-packages/pyspark/jars/spark-iforest-2.4.0.jar')
conf.setMaster("local[2]").setAppName("My app")
conf.set("spark.driver.bindAddress","127.0.0.1")
conf.set("spark.network.timeout","600")


spark = SparkSession \
        .builder \
        .config(conf=conf) \
        .appName("IForestExample") \
        .getorCreate()
print(spark)
sc = spark.sparkContext
sc.getConf().getAll()

temp_path = tempfile.mkdtemp()
iforest_path = temp_path + "/iforest"
model_path = temp_path + "/iforest_model"

# same data as in https://gist.github.com/mkaranasou/7aa1f3a28258330679dcab4277c42419 
# for comparison
data = [
    {'feature1': 1.,'feature2': 0.,'feature3': 0.3,'feature4': 0.01},{'feature1': 10.,'feature2': 3.,'feature3': 0.9,'feature4': 0.1},{'feature1': 101.,'feature2': 13.,'feature4': 0.91},{'feature1': 111.,'feature2': 11.,'feature3': 1.2,'feature4': 1.91},]

# use a VectorAssembler to gather the features as Vectors (dense)
assembler = VectorAssembler(
    inputCols=list(data[0].keys()),outputCol="features"
)

df = spark.createDataFrame(data)
df = assembler.transform(df)
df.show()


# use a StandardScaler to scale the features (as also done in https://gist.github.com/mkaranasou/7aa1f3a28258330679dcab4277c42419)
scaler = StandardScaler(inputCol='features',outputCol='scaledFeatures')
iforest = IForest(contamination=0.3,maxDepth=2)
iforest.setSeed(42)  # for reproducibility

scaler_model = scaler.fit(df)
df = scaler_model.transform(df)
df = df.withColumn('features',F.col('scaledFeatures')).drop('scaledFeatures')
model = iforest.fit(df)

最后一行是发生错误的地方。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)