spark nlp:documentassembler初始化失败,返回'java.lang.noclassdeffounderror:org/apache/spark/ml/util/mlwritable$class'

odopli94  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(836)

我正在试用中提供的contenxtawarespellcheckerhttps://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc
管道中的第一个组件是文档组装器

from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp

spark = sparknlp.start()
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

运行失败时的上述代码如下

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py", line 110, in wrapper
    return func(self,**kwargs)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\base.py", line 148, in __init__
    super(DocumentAssembler, self).__init__(classname="com.johnsnowlabs.nlp.DocumentAssembler")
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py", line 110, in wrapper
    return func(self,**kwargs)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\internal.py", line 72, in __init__
    self._java_obj = self._new_java_obj(classname, self.uid)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\ml\wrapper.py", line 69, in _new_java_obj
    return java_obj(*java_args)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1569, in __call__
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\sql\utils.py", line 131, in deco
    return f(*a,**kw)
  File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.com.johnsnowlabs.nlp.DocumentAssembler.
: java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class
        at com.johnsnowlabs.nlp.DocumentAssembler.<init>(DocumentAssembler.scala:16)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

编辑:apachespark版本是2.4.6

pb3skfrl

pb3skfrl1#

我在从spark2.45升级到spark3+时遇到过这个问题(不过在scala的databricks上)。试着降低你的Spark版本。

相关问题