我正在用cassandra连接hdp3.0上的spark,将一个Dataframe写入cassandra表,但收到以下错误:在这里输入图像描述在此处输入图像描述我要写入cassandra表的代码如下:在此处输入图像描述非常感谢!!
q5iwbnjs1#
例外情况如下:回溯(最后一次调用):文件“/etc/yum.repos.d/cassandraspark.py”,第24行,in.选项(table=“users”,keyspace=“movielens”) 文件“/usr/hdp/current/spark2 client/python/lib/pyspark.zip/pyspark/sql/readwriter.py”,第703行,保存文件“/usr/hdp/current/spark2 client/python/lib/py4j-0.10.7-src.zip/py4j/javaèu gateway.py”,第1257行,调用中文件“/usr/hdp/current/spark2 client/python/lib/pyspark.zip/pyspark/sql/utils.py”,第63行,deco格式文件“/usr/hdp/current/spark2 client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,第328行,在get\u return\u value中py4j.protocol.py4jjavaerror:调用o97.save时出错。:java.lang.noclassdeffounderror:org/apache/commons/configuration/configurationexception在org.apache.spark.sql.cassandra.defaultsource$(defaultsource。scala:135)位于org.apache.spark.sql.cassandra.defaultsource$。(defaultsource.scala)在org.apache.spark.sql.cassandra.defaultsource.createrelation(defaultsource。scala:82)在org.apache.spark.sql.execution.datasources.saveintodatasourcecommand.run(saveintodatasourcecommand。scala:45)位于org.apache.spark.sql.execution.command.executedcommandexec.sideeffectresult$lzycompute(命令)。scala:70)在org.apache.spark.sql.execution.command.executedcommandexec.sideeffectresult(commands。scala:68)在org.apache.spark.sql.execution.command.executecommandexec.doexecute(commands。scala:86)在org.apache.spark.sql.execution.sparkplan$$anonfun$execute$1.apply(sparkplan。scala:131)在org.apache.spark.sql.execution.sparkplan$$anonfun$execute$1.apply(sparkplan。scala:127)在org.apache.spark.sql.execution.sparkplan$$anonfun$executequery$1.apply(sparkplan。scala:155)在org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope。scala:151)在org.apache.spark.sql.execution.sparkplan.executequery(sparkplan。scala:152)在org.apache.spark.sql.execution.sparkplan.execute(sparkplan。scala:127)在org.apache.spark.sql.execution.queryexecution.tordd$lzycompute(queryexecution。scala:80)在org.apache.spark.sql.execution.queryexecution.tordd(queryexecution。scala:80)在org.apache.spark.sql.dataframewriter$$anonfun$runcommand$1.apply(dataframewriter。scala:656)在org.apache.spark.sql.dataframewriter$$anonfun$runcommand$1.apply(dataframewriter。scala:656)位于org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution)。scala:77)位于org.apache.spark.sql.dataframewriter.runcommand(dataframewriter。scala:656)位于org.apache.spark.sql.dataframewriter.savetov1source(dataframewriter。scala:273)位于org.apache.spark.sql.dataframewriter.save(dataframewriter。scala:267)在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:62)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:498)在py4j.reflection.methodinvoker.invoke(methodinvoker。java:244)在py4j.reflection.reflectionengine.invoke(reflectionengine。java:357)在py4j.gateway.invoke(gateway。java:282)在py4j.commands.abstractcommand.invokemethod(abstractcommand。java:132)在py4j.commands.callcommand.execute(callcommand。java:79)在py4j.gatewayconnection.run(网关连接。java:238)在java.lang.thread.run(线程。java:748)原因:java.lang.classnotfoundexception:org.apache.commons.configuration.configurationexception在java.net.urlclassloader.findclass(urlclassloader。java:382)在java.lang.classloader.loadclass(classloader。java:424)在sun.misc.launcher$appclassloader.loadclass(launcher。java:349)在java.lang.classloader.loadclass(classloader。java:357)... 32个以上20/04/05 21:07:57信息sparkcontext:从shutdown hook调用stop()20/04/05 21:07:57信息:已停止spark@724de990{http/1.1,[http/1.1]}{0.0.0.0:4040}20/04/05 21:07:57信息sparkui:已停止spark web uihttp://sandbox-hdp.hortonworks.com:404020/04/05 21:07:57信息源客户端调度程序背景:中断监视器线程20/04/05 21:07:58信息:客户端调度程序备份:关闭所有执行器20/04/05 21:07:58信息YarnSchedulerBackEndpoint$yarndriverendpoint:要求每个执行者关闭20/04/05 21:07:58信息schedulerextensionservices:停止schedulerextensionservices(serviceoption=无,服务=列表(),开始=错误)20/04/05 21:07:58信息源客户端计划备份:已停止20/04/05 21:07:58信息mapoutputtrackermasterendpoint:mapoutputtrackermasterendpoint已停止!20/04/05 21:07:58信息内存存储:内存存储已清除20/04/05 21:07:58信息blockmanager:blockmanager已停止20/04/05 21:07:58信息blockmanagermaster:blockmanagermaster已停止20/04/05 21:07:58信息outputcommitcoordinator$outputcommitcoordinatorendpoint:outputcommitcoordinator已停止!20/04/05 21:07:58信息sparkcontext:已成功停止sparkcontext20/04/05 21:07:58信息shutdownhookmanager:调用了shutdownhook20/04/05 21:07:58信息关闭hookmanager:删除目录/tmp/spark-4b615cf3-aab0-44e7-bc4f-ef8039b2a26e20/04/05 21:07:58信息关闭hookmanager:删除目录/tmp/spark-8c4e6b45-5ade-4e73-b9b7-ec10694bf14520/04/05 21:07:58信息关机hookmanager:删除目录/tmp/spark-4b615cf3-aab0-44e7-bc4f-ef8039b2a26e/pyspark-9b577311-43b8-4608-85 7e-5b0ab52553e2
pobjuy322#
hdp3.0基于hadoop3.1.1,它使用 commons-configuration2 图书馆而不是 commons-configuration 由spark cassandra连接器使用。你可以开始工作了 spark-shell 或者 spark-submit 包括以下内容:
commons-configuration2
commons-configuration
spark-shell
spark-submit
spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.3.1,commons-configuration:commons-configuration:1.10
显式添加 commons-configuration .
2条答案
按热度按时间q5iwbnjs1#
例外情况如下:回溯(最后一次调用):
文件“/etc/yum.repos.d/cassandraspark.py”,第24行,in
.选项(table=“users”,keyspace=“movielens”)
文件“/usr/hdp/current/spark2 client/python/lib/pyspark.zip/pyspark/sql/readwriter.py”,第703行,保存
文件“/usr/hdp/current/spark2 client/python/lib/py4j-0.10.7-src.zip/py4j/javaèu gateway.py”,第1257行,调用中
文件“/usr/hdp/current/spark2 client/python/lib/pyspark.zip/pyspark/sql/utils.py”,第63行,deco格式
文件“/usr/hdp/current/spark2 client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,第328行,在get\u return\u value中
py4j.protocol.py4jjavaerror:调用o97.save时出错。
:java.lang.noclassdeffounderror:org/apache/commons/configuration/configurationexception
在org.apache.spark.sql.cassandra.defaultsource$(defaultsource。scala:135)
位于org.apache.spark.sql.cassandra.defaultsource$。(defaultsource.scala)
在org.apache.spark.sql.cassandra.defaultsource.createrelation(defaultsource。scala:82)
在org.apache.spark.sql.execution.datasources.saveintodatasourcecommand.run(saveintodatasourcecommand。scala:45)
位于org.apache.spark.sql.execution.command.executedcommandexec.sideeffectresult$lzycompute(命令)。scala:70)
在org.apache.spark.sql.execution.command.executedcommandexec.sideeffectresult(commands。scala:68)
在org.apache.spark.sql.execution.command.executecommandexec.doexecute(commands。scala:86)
在org.apache.spark.sql.execution.sparkplan$$anonfun$execute$1.apply(sparkplan。scala:131)
在org.apache.spark.sql.execution.sparkplan$$anonfun$execute$1.apply(sparkplan。scala:127)
在org.apache.spark.sql.execution.sparkplan$$anonfun$executequery$1.apply(sparkplan。scala:155)
在org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope。scala:151)
在org.apache.spark.sql.execution.sparkplan.executequery(sparkplan。scala:152)
在org.apache.spark.sql.execution.sparkplan.execute(sparkplan。scala:127)
在org.apache.spark.sql.execution.queryexecution.tordd$lzycompute(queryexecution。scala:80)
在org.apache.spark.sql.execution.queryexecution.tordd(queryexecution。scala:80)
在org.apache.spark.sql.dataframewriter$$anonfun$runcommand$1.apply(dataframewriter。scala:656)
在org.apache.spark.sql.dataframewriter$$anonfun$runcommand$1.apply(dataframewriter。scala:656)
位于org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution)。scala:77)
位于org.apache.spark.sql.dataframewriter.runcommand(dataframewriter。scala:656)
位于org.apache.spark.sql.dataframewriter.savetov1source(dataframewriter。scala:273)
位于org.apache.spark.sql.dataframewriter.save(dataframewriter。scala:267)
在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)
在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:62)
在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)
在java.lang.reflect.method.invoke(方法。java:498)
在py4j.reflection.methodinvoker.invoke(methodinvoker。java:244)
在py4j.reflection.reflectionengine.invoke(reflectionengine。java:357)
在py4j.gateway.invoke(gateway。java:282)
在py4j.commands.abstractcommand.invokemethod(abstractcommand。java:132)
在py4j.commands.callcommand.execute(callcommand。java:79)
在py4j.gatewayconnection.run(网关连接。java:238)
在java.lang.thread.run(线程。java:748)
原因:java.lang.classnotfoundexception:org.apache.commons.configuration.configurationexception
在java.net.urlclassloader.findclass(urlclassloader。java:382)
在java.lang.classloader.loadclass(classloader。java:424)
在sun.misc.launcher$appclassloader.loadclass(launcher。java:349)
在java.lang.classloader.loadclass(classloader。java:357)
... 32个以上
20/04/05 21:07:57信息sparkcontext:从shutdown hook调用stop()
20/04/05 21:07:57信息:已停止spark@724de990{http/1.1,[http/1.1]}{0.0.0.0:4040}
20/04/05 21:07:57信息sparkui:已停止spark web uihttp://sandbox-hdp.hortonworks.com:4040
20/04/05 21:07:57信息源客户端调度程序背景:中断监视器线程
20/04/05 21:07:58信息:客户端调度程序备份:关闭所有执行器
20/04/05 21:07:58信息YarnSchedulerBackEndpoint$yarndriverendpoint:要求每个执行者关闭
20/04/05 21:07:58信息schedulerextensionservices:停止schedulerextensionservices
(serviceoption=无,
服务=列表(),
开始=错误)
20/04/05 21:07:58信息源客户端计划备份:已停止
20/04/05 21:07:58信息mapoutputtrackermasterendpoint:mapoutputtrackermasterendpoint已停止!
20/04/05 21:07:58信息内存存储:内存存储已清除
20/04/05 21:07:58信息blockmanager:blockmanager已停止
20/04/05 21:07:58信息blockmanagermaster:blockmanagermaster已停止
20/04/05 21:07:58信息outputcommitcoordinator$outputcommitcoordinatorendpoint:outputcommitcoordinator已停止!
20/04/05 21:07:58信息sparkcontext:已成功停止sparkcontext
20/04/05 21:07:58信息shutdownhookmanager:调用了shutdownhook
20/04/05 21:07:58信息关闭hookmanager:删除目录/tmp/spark-4b615cf3-aab0-44e7-bc4f-ef8039b2a26e
20/04/05 21:07:58信息关闭hookmanager:删除目录/tmp/spark-8c4e6b45-5ade-4e73-b9b7-ec10694bf145
20/04/05 21:07:58信息关机hookmanager:删除目录/tmp/spark-4b615cf3-aab0-44e7-bc4f-ef8039b2a26e/pyspark-9b577311-43b8-4608-85 7e-5b0ab52553e2
pobjuy322#
hdp3.0基于hadoop3.1.1,它使用
commons-configuration2
图书馆而不是commons-configuration
由spark cassandra连接器使用。你可以开始工作了spark-shell
或者spark-submit
包括以下内容:显式添加
commons-configuration
.