spark 1.4缺少kafka库

wswtfjt7  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(363)

我正在尝试运行一个pythonspark脚本,它在spark1.3.1中运行得非常好。我已经下载了spark 1.4并试着运行脚本,但它总是失败
在类路径中找不到spark streaming的kafka库。请尝试下列操作之一。
在spark submit命令中包含kafka库及其依赖项

$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.4.0 ...

从maven central下载工件的jarhttp://search.maven.org/,group id=org.apache.spark,artifact id=spark streaming kafka assembly,版本=1.4.0。然后,在spark submit命令中包含jar,如下所示

$ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...

我在submit命令中显式引用了jar,并将jar添加为

/opt/spark/spark-1.4.0-bin-hadoop2.6/bin/spark-submit --jars spark-streaming_2.10-1.4.0.jar,spark-core_2.10-1.4.0.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar,kafka_2.10-0.8.2.1.jar,kafka-clients-0.8.2.1.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar /root/SparkPySQLNew.py

它还说它已经在应用程序启动时添加了它们,为什么找不到它们呢?

15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-streaming_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming_2.10-1.4.0.jar with timestamp 1436334277792
15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-core_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-core_2.10-1.4.0.jar with timestamp 1436334277919
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278295
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka_2.10-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka_2.10-0.8.2.1.jar with timestamp 1436334278353
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka-clients-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka-clients-0.8.2.1.jar with timestamp 1436334278357
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278665
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar with timestamp 1436334278666

我知道我已经加了很多,我一开始就加了一个,最后又加了一个。

k97glaaz

k97glaaz1#

我怀疑每个版本的spark的确切答案各不相同,但基于这个hcc线程,以下内容似乎对其他人起到了作用:

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar

乍一看,区别在于它有1个spark流kafka程序集jar,而您提交的是2个。

相关问题