Spring Boot 在Java中运行Apache Spark Word2Vec时出现“NoClassDefFoundError”错误

r7knjye2  于 2022-12-23  发布在  Spring
关注(0)|答案(1)|浏览(157)

我是Apache Spark的新手,尝试在Springboot中使用它的Word 2 Vec功能生成同义词,但总是得到一个错误。

SparkSession spark = SparkSession.builder().appName("Synonym Recommender")
                    .config("spark.master", "local")
                    .getOrCreate()

        JavaRDD<String> lines = spark.read().textFile(Paths.get("src/main/resources/static/text8.txt").toString()).toJavaRDD();
        JavaRDD<Iterable<String>> wordsIterable = lines.map(new Function<String, Iterable<String>>() {
            public Iterable<String> call(String s) throws Exception {
                String[] words = s.split(" ");
                Iterable<String> output = Arrays.asList(words);
                return output;
            }
        })
        Word2Vec vec = new Word2Vec()
        vecModel = vec.fit(wordsIterable)`

当我运行上面的代码时,我得到了以下错误(底部的全栈跟踪):

java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException

下面是我的pom.xml中的相关条目:

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.13</artifactId>
        <version>3.3.1</version>
        <exclusions>
            <exclusion>
                <artifactId>janino</artifactId>
                <groupId>org.codehaus.janino</groupId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>janino</artifactId>
        <version>3.1.9</version>
    </dependency>

基于我看到的一个可能的解决方案,我单独包含了janino依赖项,但这似乎也不起作用。

Caused by: java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException
at org.apache.spark.sql.catalyst.expressions.objects.GetExternalRowField.<init>(objects.scala:1850) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:195) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at scala.collection.ArrayOps$.flatMap$extension(ArrayOps.scala:986) ~[scala-library-2.13.0.jar:na]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:192) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:73) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:81) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:92) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:444) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:202) ~[scala-library-2.13.0.jar:na]
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:645) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:682) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:654) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader$textFile.call(Unknown Source) ~[na:na]
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) ~[groovy-2.5.14.jar:2.5.14]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115) ~[groovy-2.5.14.jar:2.5.14]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) ~[groovy-2.5.14.jar:2.5.14]
at com.tcwb.classification.services.USMLService.loadWord2VecModel(testapp.groovy:591) ~[classes/:na]
at com.tcwb.classification.services.USMLService.postConstruct(testapp.groovy:76) ~[classes/:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:389) ~[spring-beans-5.3.6.jar:5.3.6]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:333) ~[spring-beans-5.3.6.jar:5.3.6]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:157) ~[spring-beans-5.3.6.jar:5.3.6]
... 56 common frames omitted

如果有其他的,更轻量级的/预先训练的替代品,我应该考虑在java中生成同义词,这些也将受到赞赏。

ffx8fchx

ffx8fchx1#

所以......结果是,我需要包含另一个依赖项来修复编译器问题,这并不是我的痛苦的结束(更多的错误随之而来),但至少是对当前问题的解决方案(commons-compiler):

<dependency>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>commons-compiler</artifactId>
        <version>3.0.8</version>
    </dependency>
    <dependency>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>janino</artifactId>
        <version>3.0.8</version>
    </dependency>

相关问题