连接到cassandra时获取classnotfoundexception找不到数据源(写入)

zvms9eto  于 2021-06-09  发布在  Cassandra
关注(0)|答案(0)|浏览(281)

我在cloudera集群中有数百个spark作业并行运行,整天都在向cassandra集群写信。每天,当集群在多租户环境中受到压力时,通常会有一些失败并出现以下错误,这些集群在随后的重新运行中运行良好,而无需对spark作业进行任何修改。

java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark.apache.org/third-party-projects.html
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)
    at my.code.CassandraWriter$.writeDataframeToDC(CassandraWriter.scala:43)
    at my.code.CassandraWriter$.writeDataframe(CassandraWriter.scala:18)
    at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach_quick(ParArray.scala:143)
    at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:136)
    at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
    at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
    at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.DefaultSource
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
    at scala.util.Try.orElse(Try.scala:84)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)

我的项目不是一个胖jar。datastax依赖项jar在给定路径下可用,名称如下:

.../distributedLibs/spark-cassandra-connector_2.11-2.3.0.jar

其思想是只有distributedlibs文件夹jar在集群节点上被分发,所有lib文件夹jar在spark提交时被添加到类路径。这是为了减少集群节点上spark作业工作目录所需的空间。
我的spark提交命令如下:

sparks-submit
    --jars "${distributedLib_classpath}" \
    --driver-class-path ${distributedLib_classpath} \
    --conf spark.executor.extraClassPath=${distributedLib_classpath} \

请说明是什么导致了这个问题,以及为什么重新运行时会出现这种情况?我怎样才能确保没有这样的失败呢?
通常,在集群上并行运行多个作业时会发生故障,从而导致资源争用。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题