当使用2.8.0 hadoop aws jar时,无法使用spark读取s3;当使用hadoop aws 2.7.3时,无法将delta表写入s3

5uzkadbs  于 2021-05-31  发布在  Hadoop
关注(0)|答案(1)|浏览(576)

当我使用hadoopawsjar2.8.0时,我无法从spark访问s3。基本上,我想从s3创建一个(Parquet地板)文件,并在s3中将其作为delta表写入。

//Spark shell command
spark-shell --packages org.apache.hadoop:hadoop-aws:2.8.0,io.delta:delta-core_2.11:0.5.0,com.amazonaws:aws-java-sdk:1.10.4

sc.hadoopConfiguration.set("fs.s3a.access.key", "xxx")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "yyy")  
val tempDF=spark.read.option("basePath","s3a://xxxx/tt/").load("s3a://xxxx/tt/y=1","s3a://xxxx/tt/y=2")

错误:

java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
  at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)
  at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:216)
  at org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:139)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:174)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:547)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:355)
  at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  ... 49 elided

但是
当我使用hadoopaws2.7.3jar时,我能够读取但不能作为delta表写入。能够像Parquet地板一样写作

//spark shell

spark-shell --packages org.apache.hadoop:hadoop-aws:2.8.0,io.delta:delta-core_2.11:0.5.0,com.amazonaws:aws-java-sdk:1.7.4

sc.hadoopConfiguration.set("fs.s3a.access.key", "xxx")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "yyy")  
val tempDF=spark.read.option("basePath","s3a://xxxx/tt/").load("s3a://xxxx/tt/y=1","s3a://xxxx/tt/y=2")
sc.hadoopConfiguration.set("fs.AbstractFileSystem.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
tempDF.write.partitionBy("cols").save("ss3a://xxxx/tt_1/") // works fine

tempDF.write.format("delta").partitionBy("cols").save("ss3a://xxxx/tt_1/") -- ERROR

我已经搜索了许多网页和论坛的解决它,但没有运气。我从spark cluster上的s3读取spark job得到的一个信息给出了illegalaccesserror:尝试访问方法mutablecounterlong is**,我建议hadoop-aws-2.7.x是根据aws sdk1.7.4构建的;Hadoop2.8的版本是1.10。我记得,我们只在hadoop2.9jira.apache.org/jira/browse/hadoop-13050中切换到了1.11。如果您已经使用了它并且它可以工作,那么您还没有尝试运行hadoop aws集成测试**
尝试了很多方法来解决它。请帮我找到正确的路。它是依赖性问题,还是版本问题。请帮我把这个修好。错误:

20/04/15 12:49:21 WARN DeltaLog: Failed to parse s3a://xxxx/tt_1/_delta_log/_last_checkpoint. This may happen if there was an error during read operation, or a file appears to be partial. Sleeping and trying again.
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
        at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:135)
        at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:164)
        at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:249)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
        at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:448)
        at org.apache.spark.sql.delta.storage.HDFSLogStore.getFileContext(HDFSLogStore.scala:53)
        at org.apache.spark.sql.delta.storage.HDFSLogStore.read(HDFSLogStore.scala:57)
        at org.apache.spark.sql.delta.Checkpoints$class.loadMetadataFromFile(Checkpoints.scala:139)
        at org.apache.spark.sql.delta.Checkpoints$class.lastCheckpoint(Checkpoints.scala:133)
        at org.apache.spark.sql.delta.DeltaLog.lastCheckpoint(DeltaLog.scala:58)
        at org.apache.spark.sql.delta.DeltaLog.<init>(DeltaLog.scala:139)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1$$anonfun$apply$10.apply(DeltaLog.scala:744)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1$$anonfun$apply$10.apply(DeltaLog.scala:744)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1.apply(DeltaLog.scala:743)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1.apply(DeltaLog.scala:743)
        at com.databricks.spark.util.DatabricksLogging$class.recordOperation(DatabricksLogging.scala:77)
        at org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:671)
        at org.apache.spark.sql.delta.metering.DeltaLogging$class.recordDeltaOperation(DeltaLogging.scala:103)
        at org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:671)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:742)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:740)
        at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
        at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
        at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
        at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)
        at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:126)
        at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:27)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:34)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:36)
        at $line26.$read$$iw$$iw$$iw$$iw.<init>(<console>:38)
        at $line26.$read$$iw$$iw$$iw.<init>(<console>:40)
        at $line26.$read$$iw$$iw.<init>(<console>:42)
        at $line26.$read$$iw.<init>(<console>:44)
        at $line26.$read.<init>(<console>:46)
        at $line26.$read$.<init>(<console>:50)
        at $line26.$read$.<clinit>(<console>)
        at $line26.$eval$.$print$lzycompute(<console>:7)
        at $line26.$eval$.$print(<console>:6)
        at $line26.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
        at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:819)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:691)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:404)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:425)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:285)
        at org.apache.spark.repl.SparkILoop.runClosure(SparkILoop.scala:159)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:182)
        at org.apache.spark.repl.Main$.doMain(Main.scala:78)
        at org.apache.spark.repl.Main$.main(Main.scala:58)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
        at java.lang.Class.getConstructor0(Class.java:3082)
        at java.lang.Class.getDeclaredConstructor(Class.java:2178)
        at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:129)
        ... 106 more
20/04/15 12:49:22 WARN DeltaLog: Failed to parse s3a://xxxx/tt_1/_delta_log/_last_checkpoint. This may happen if there was an error during read operation, or a file appears to be partial. Sleeping and trying again.
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
        at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:135)
        at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:164)
        at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:249)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
        at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:448)
        at org.apache.spark.sql.delta.storage.HDFSLogStore.getFileContext(HDFSLogStore.scala:53)
        at org.apache.spark.sql.delta.storage.HDFSLogStore.read(HDFSLogStore.scala:57)
        at org.apache.spark.sql.delta.Checkpoints$class.loadMetadataFromFile(Checkpoints.scala:139)
        at org.apache.spark.sql.delta.Checkpoints$class.loadMetadataFromFile(Checkpoints.scala:150)
        at org.apache.spark.sql.delta.Checkpoints$class.lastCheckpoint(Checkpoints.scala:133)
        at org.apache.spark.sql.delta.DeltaLog.lastCheckpoint(DeltaLog.scala:58)
        at org.apache.spark.sql.delta.DeltaLog.<init>(DeltaLog.scala:139)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1$$anonfun$apply$10.apply(DeltaLog.scala:744)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1$$anonfun$apply$10.apply(DeltaLog.scala:744)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1.apply(DeltaLog.scala:743)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1.apply(DeltaLog.scala:743)
        at com.databricks.spark.util.DatabricksLogging$class.recordOperation(DatabricksLogging.scala:77)
        at org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:671)
        at org.apache.spark.sql.delta.metering.DeltaLogging$class.recordDeltaOperation(DeltaLogging.scala:103)
        at org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:671)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:742)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:740)
        at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
        at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
        at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
        at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)
        at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:126)
        at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:27)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:34)
        at $line26.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:36)
        at $line26.$read$$iw$$iw$$iw$$iw.<init>(<console>:38)
        at $line26.$read$$iw$$iw$$iw.<init>(<console>:40)
        at $line26.$read$$iw$$iw.<init>(<console>:42)
        at $line26.$read$$iw.<init>(<console>:44)
        at $line26.$read.<init>(<console>:46)
        at $line26.$read$.<init>(<console>:50)
        at $line26.$read$.<clinit>(<console>)
        at $line26.$eval$.$print$lzycompute(<console>:7)
        at $line26.$eval$.$print(<console>:6)
        at $line26.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
        at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:819)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:691)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:404)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:425)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:285)
        at org.apache.spark.repl.SparkILoop.runClosure(SparkILoop.scala:159)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:182)
        at org.apache.spark.repl.Main$.doMain(Main.scala:78)
        at org.apache.spark.repl.Main$.main(Main.scala:58)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
        at java.lang.Class.getConstructor0(Class.java:3082)
        at java.lang.Class.getDeclaredConstructor(Class.java:2178)
        at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:129)
        ... 107 more
20/04/15 12:49:23 WARN DeltaLog: Failed to parse s3a://xxxx/tt_1/_delta_log/_last_checkpoint. This may happen if there was an error during read operation, or a file appears to be partial. Sleeping and trying again.
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
        at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:135)
        at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:164)
        at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:249)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
        at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:448)
        at org.apache.spark.sql.delta.storage.HDFSLogStore.getFileContext(HDFSLogStore.scala:53)
        at org.apache.spark.sql.delta.storage.HDFSLogStore.read(HDFSLogStore.scala:57)
        at org.apache.spark.sql.delta.Checkpoints$class.loadMetadataFromFile(Checkpoints.scala:139)
        at org.apache.spark.sql.delta.Checkpoints$class.loadMetadataFromFile(Checkpoints.scala:150)
        at org.apache.spark.sql.delta.Checkpoints$class.loadMetadataFromFile(Checkpoints.scala:150)
        at org.apache.spark.sql.delta.Checkpoints$class.lastCheckpoint(Checkpoints.scala:133)
        at org.apache.spark.sql.delta.DeltaLog.lastCheckpoint(DeltaLog.scala:58)
        at org.apache.spark.sql.delta.DeltaLog.<init>(DeltaLog.scala:139)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1$$anonfun$apply$10.apply(DeltaLog.scala:744)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1$$anonfun$apply$10.apply(DeltaLog.scala:744)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1.apply(DeltaLog.scala:743)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3$$anonfun$call$1.apply(DeltaLog.scala:743)
        at com.databricks.spark.util.DatabricksLogging$class.recordOperation(DatabricksLogging.scala:77)
        at org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:671)
        at org.apache.spark.sql.delta.metering.DeltaLogging$class.recordDeltaOperation(DeltaLogging.scala:103)
        at org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:671)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:742)
        at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:740)
        at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
        at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
        at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
        at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)
        at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:126)
        at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 

  ... 111 more
1cklez4t

1cklez4t1#

我在使用hadoop 2.8x版时遇到了类似的问题。。我相信 java.lang.IllegalAccessError 你得到的是因为 hadoop-common 与spark安装不匹配的库。我通过安装 Spark-without-hadoop ,并设置 SPARK_DIST_CLASSPATH env到hadoop类路径。
https://spark.apache.org/docs/latest/hadoop-provided.html

相关问题