我正在尝试在gcp上下文中测试spark hbase连接器,并尝试遵循1,它要求使用maven(我尝试了maven 3.6.3)为spark 2.4本地打包连接器[2],在上提交作业时出现以下错误 Dataproc
(完成[3]之后)。
你知道吗?
谢谢你的支持
参考文献
1https://github.com/googlecloudplatform/cloud-bigtable-examples/tree/master/scala/bigtable-shc
[2] https://github.com/hortonworks-spark/shc/tree/branch-2.4
[3] spark hbase-gcp模板(1/3)-如何本地打包hortonworks连接器?
命令 (base) gcloud dataproc jobs submit spark --cluster $SPARK_CLUSTER --class com.example.bigtable.spark.shc.BigtableSource --jars target/scala-2.11/cloud-bigtable-dataproc-spark-shc-assembly-0.1.jar --region us-east1 -- $BIGTABLE_TABLE
错误 Job [d3b9107ae5e2462fa71689cb0f5909bd] submitted. Waiting for job output... 20/12/27 12:50:10 INFO org.spark_project.jetty.util.log: Logging initialized @2475ms 20/12/27 12:50:10 INFO org.spark_project.jetty.server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 20/12/27 12:50:10 INFO org.spark_project.jetty.server.Server: Started @2576ms 20/12/27 12:50:10 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector@3e6cb045{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 20/12/27 12:50:10 WARN org.apache.spark.scheduler.FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration. 20/12/27 12:50:11 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at spark-cluster-m/10.142.0.10:8032 20/12/27 12:50:11 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at spark-cluster-m/10.142.0.10:10200 20/12/27 12:50:13 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1609071162129_0002 Exception in thread "main" java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse$default$3()Z at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:262) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:84) at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:61) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) at com.example.bigtable.spark.shc.BigtableSource$.delayedEndpoint$com$example$bigtable$spark$shc$BigtableSource$1(BigtableSource.scala:56) at com.example.bigtable.spark.shc.BigtableSource$delayedInit$body.apply(BigtableSource.scala:19) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at com.example.bigtable.spark.shc.BigtableSource$.main(BigtableSource.scala:19) at com.example.bigtable.spark.shc.BigtableSource.main(BigtableSource.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:890) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:217) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 20/12/27 12:50:20 INFO org.spark_project.jetty.server.AbstractConnector: Stopped Spark@3e6cb045{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
1条答案
按热度按时间wn9m85ua1#
考虑阅读这些相关的so问题:1和2。
在您所遵循的教程以及其中一个问题的框架下,使用hortonworks提供的ApacheSpark-ApacheHBase连接器。
这个问题似乎与该版本的不兼容有关
json4s
库:在这两种情况下,使用版本3.2.10
或者3.2.11
在构建过程中会解决这个问题。在中添加以下依赖项
pom.xml (shc-core)
: