Apache Spark 使用云SQL代理从Dataproc连接到云SQL

lb3vh1jj  于 2023-05-07  发布在  Apache
关注(0)|答案(1)|浏览(163)

我尝试通过Cloud SQL Proxy(不使用Hive)和Scala 2.11.12从Dataproc访问Cloud SQL。在SO中有类似的问题,但没有一个能回答我所面临的问题。
我已经设法将Dataproc连接到Cloud SQL,并将spark.master置于“local”模式,但在使用“yarn”模式时出现异常,所以我肯定错过了一些东西。
执行以下操作时,应用程序崩溃:

SparkSession
  .builder()
  .appName("SomeSparkJob")
  .getOrCreate()

当作业提交时,它执行上面的.getOrCreate()时,我得到的异常:

Exception in thread "main" java.lang.NoSuchFieldError: ASCII
        at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.checkTags(ApplicationSubmissionContextPBImpl.java:287)
        at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.setApplicationTags(ApplicationSubmissionContextPBImpl.java:302)
        at org.apache.spark.deploy.yarn.Client$$anonfun$createApplicationSubmissionContext$2.apply(Client.scala:245)
        at org.apache.spark.deploy.yarn.Client$$anonfun$createApplicationSubmissionContext$2.apply(Client.scala:244)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:244)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:180)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:183)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
        at dev.ancor.somedataprocsparkjob.SomeSparkJob$.main(SomeSparkJob.scala:13)
        at dev.ancor.somedataprocsparkjob.SomeSparkJob.main(SomeSparkJob.scala)

问题是:为什么我在“yarn”模式下运行时会出现这个异常,我该如何修复它?谢谢大家!

des4xlb0

des4xlb01#

正如Gabe韦斯和大卫Rabinowitz所证实的,我们可以将Dataproc集群和Cloud SQL放在VPC网络中,只使用private IP。无需使用Cloud SQL Proxy。

相关问题