spark-nlp:DocumentAssembler初始化失败,并出现“ java.lang.NoClassDefFoundError:org / apache / spark / ml / util / MLWritable $ class”

问题描述

我正在尝试https://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc

中提供的ContenxtAwareSpellChecker

管道中的第一个组件是 DocumentAssembler

from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp


spark = sparknlp.start()
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setoutputCol("document")

上面的代码在运行时失败,如下所示

Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py",line 110,in wrapper
    return func(self,**kwargs)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\base.py",line 148,in __init__
    super(DocumentAssembler,self).__init__(classname="com.johnsNowlabs.nlp.DocumentAssembler")
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py",**kwargs)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\internal.py",line 72,in __init__
    self._java_obj = self._new_java_obj(classname,self.uid)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\ml\wrapper.py",line 69,in _new_java_obj
    return java_obj(*java_args)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",line 1569,in __call__
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\sql\utils.py",line 131,in deco
    return f(*a,**kw)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py",line 328,in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.com.johnsNowlabs.nlp.DocumentAssembler.
: java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class
        at com.johnsNowlabs.nlp.DocumentAssembler.<init>(DocumentAssembler.scala:16)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

编辑:Apache Spark版本为2.4.6

解决方法

从spark 2.45升级到spark 3+时,我遇到了这个问题(不过在使用Scala的Databricks上)。尝试降级您的Spark版本。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...