问题描述
我正在尝试https://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc
中提供的ContenxtAwareSpellChecker管道中的第一个组件是 DocumentAssembler
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
spark = sparknlp.start()
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setoutputCol("document")
上面的代码在运行时失败,如下所示
Traceback (most recent call last):
File "<stdin>",line 1,in <module>
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py",line 110,in wrapper
return func(self,**kwargs)
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\base.py",line 148,in __init__
super(DocumentAssembler,self).__init__(classname="com.johnsNowlabs.nlp.DocumentAssembler")
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py",**kwargs)
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\internal.py",line 72,in __init__
self._java_obj = self._new_java_obj(classname,self.uid)
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\ml\wrapper.py",line 69,in _new_java_obj
return java_obj(*java_args)
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",line 1569,in __call__
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\sql\utils.py",line 131,in deco
return f(*a,**kw)
File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py",line 328,in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.com.johnsNowlabs.nlp.DocumentAssembler.
: java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class
at com.johnsNowlabs.nlp.DocumentAssembler.<init>(DocumentAssembler.scala:16)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
编辑:Apache Spark版本为2.4.6
解决方法
从spark 2.45升级到spark 3+时,我遇到了这个问题(不过在使用Scala的Databricks上)。尝试降级您的Spark版本。