我是Apache Spark的新手,尝试在Springboot中使用它的Word 2 Vec功能生成同义词,但总是得到一个错误。
SparkSession spark = SparkSession.builder().appName("Synonym Recommender")
.config("spark.master", "local")
.getOrCreate()
JavaRDD<String> lines = spark.read().textFile(Paths.get("src/main/resources/static/text8.txt").toString()).toJavaRDD();
JavaRDD<Iterable<String>> wordsIterable = lines.map(new Function<String, Iterable<String>>() {
public Iterable<String> call(String s) throws Exception {
String[] words = s.split(" ");
Iterable<String> output = Arrays.asList(words);
return output;
}
})
Word2Vec vec = new Word2Vec()
vecModel = vec.fit(wordsIterable)`
当我运行上面的代码时,我得到了以下错误(底部的全栈跟踪):
java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException
下面是我的pom.xml中的相关条目:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.13</artifactId>
<version>3.3.1</version>
<exclusions>
<exclusion>
<artifactId>janino</artifactId>
<groupId>org.codehaus.janino</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.1.9</version>
</dependency>
基于我看到的一个可能的解决方案,我单独包含了janino依赖项,但这似乎也不起作用。
Caused by: java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException
at org.apache.spark.sql.catalyst.expressions.objects.GetExternalRowField.<init>(objects.scala:1850) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:195) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at scala.collection.ArrayOps$.flatMap$extension(ArrayOps.scala:986) ~[scala-library-2.13.0.jar:na]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:192) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:73) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:81) ~[spark-catalyst_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:92) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:444) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:202) ~[scala-library-2.13.0.jar:na]
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:645) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:682) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:654) ~[spark-sql_2.13-3.3.1.jar:3.3.1]
at org.apache.spark.sql.DataFrameReader$textFile.call(Unknown Source) ~[na:na]
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) ~[groovy-2.5.14.jar:2.5.14]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115) ~[groovy-2.5.14.jar:2.5.14]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) ~[groovy-2.5.14.jar:2.5.14]
at com.tcwb.classification.services.USMLService.loadWord2VecModel(testapp.groovy:591) ~[classes/:na]
at com.tcwb.classification.services.USMLService.postConstruct(testapp.groovy:76) ~[classes/:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:389) ~[spring-beans-5.3.6.jar:5.3.6]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:333) ~[spring-beans-5.3.6.jar:5.3.6]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:157) ~[spring-beans-5.3.6.jar:5.3.6]
... 56 common frames omitted
如果有其他的,更轻量级的/预先训练的替代品,我应该考虑在java中生成同义词,这些也将受到赞赏。
1条答案
按热度按时间ffx8fchx1#
所以......结果是,我需要包含另一个依赖项来修复编译器问题,这并不是我的痛苦的结束(更多的错误随之而来),但至少是对当前问题的解决方案(commons-compiler):