Apache Spark hadoop-common / hadoop-aws / aws-java-sdk-bundle版本兼容性?

hgb9j2n6  于 2023-10-23  发布在  Apache
关注(0)|答案(1)|浏览(145)

当我尝试从S3读取时,我在一个worker上得到了这个异常:

java.lang.NoSuchMethodError: 'java.lang.Object org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(org.apache.hadoop.fs.statistics.DurationTracker, org.apache.hadoop.util.functional.CallableRaisingIOE)'

故障排除页面以及许多其他答案,这个问题,我发现都说要检查我的版本。他们是:
Spark 3.5.0

scalaVersion := "2.12.18"

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-sql" % "3.5.0",
    "mysql" % "mysql-connector-java" % "8.0.33",
    "org.apache.hadoop" % "hadoop-common" % "3.3.6",
    "org.apache.hadoop" % "hadoop-aws" % "3.3.6",
    "com.amazonaws" % "aws-java-sdk-bundle" % "1.12.367"
)

hadoop-aws/3.3.6标识aws-java-sdk-bundle 1.12.367,这就是我所拥有的。
版本看起来对吗?我还漏掉了什么吗?
谢谢你,谢谢
命令列:

~/spark/spark-3.5.0-bin-hadoop3/bin/spark-submit \
    --jars ~/mysql-connector/mysql-connector-j-8.1.0/mysql-connector-j-8.1.0.jar,/home/pav/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/3.3.6/hadoop-common-3.3.6.jar,/home/pav/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.6/hadoop-aws-3.3.6.jar,/home/pav/.cache/coursier/v1/https/repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.367/aws-java-sdk-bundle-1.12.367.jar \
    --driver-class-path ~/mysql-connector/mysql-connector-j-8.1.0/mysql-connector-j-8.1.0.jar \
    --master spark://10.0.10.10:7077 \
    --executor-cores 1 \
    --executor-memory 450M \
    --deploy-mode client
    target/scala-2.12/myapp_2.12-0.1.jar

工作日志:

23/10/19 13:56:11 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/10/19 13:56:11 INFO Executor: Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@726b1ac5 for default.
23/10/19 13:56:11 INFO Executor: Fetching spark://10.0.10.10:34671/jars/mysql-connector-j-8.1.0.jar with timestamp 1697759762888
23/10/19 13:56:11 INFO TransportClientFactory: Successfully created connection to /10.0.10.10:34671 after 0 ms (0 ms spent in bootstraps)
23/10/19 13:56:11 INFO Utils: Fetching spark://10.0.10.10:34671/jars/mysql-connector-j-8.1.0.jar to /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/fetchFileTemp11335734498750460964.tmp
23/10/19 13:56:11 INFO Utils: Copying /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/12449360971697759762888_cache to /home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./mysql-connector-j-8.1.0.jar
23/10/19 13:56:11 INFO Executor: Adding file:/home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./mysql-connector-j-8.1.0.jar to class loader default
23/10/19 13:56:11 INFO Executor: Fetching spark://10.0.10.10:34671/jars/aws-java-sdk-bundle-1.12.367.jar with timestamp 1697759762888
23/10/19 13:56:11 INFO Utils: Fetching spark://10.0.10.10:34671/jars/aws-java-sdk-bundle-1.12.367.jar to /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/fetchFileTemp317765236136671316.tmp
23/10/19 13:56:11 INFO Utils: Copying /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/4147033691697759762888_cache to /home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./aws-java-sdk-bundle-1.12.367.jar
23/10/19 13:56:11 INFO Executor: Adding file:/home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./aws-java-sdk-bundle-1.12.367.jar to class loader default
23/10/19 13:56:11 INFO Executor: Fetching spark://10.0.10.10:34671/jars/hadoop-aws-3.3.6.jar with timestamp 1697759762888
23/10/19 13:56:11 INFO Utils: Fetching spark://10.0.10.10:34671/jars/hadoop-aws-3.3.6.jar to /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/fetchFileTemp568254465488049915.tmp
23/10/19 13:56:11 INFO Utils: Copying /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/13522706721697759762888_cache to /home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./hadoop-aws-3.3.6.jar
23/10/19 13:56:11 INFO Executor: Adding file:/home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./hadoop-aws-3.3.6.jar to class loader default
23/10/19 13:56:11 INFO Executor: Fetching spark://10.0.10.10:34671/jars/myapp_2.12-0.1.jar with timestamp 1697759762888
23/10/19 13:56:11 INFO Utils: Fetching spark://10.0.10.10:34671/jars/myapp_2.12-0.1.jar to /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/fetchFileTemp16642361919099310001.tmp
23/10/19 13:56:11 INFO Utils: Copying /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/-14941235251697759762888_cache to /home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./myapp_2.12-0.1.jar
23/10/19 13:56:11 INFO Executor: Adding file:/home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./myall_2.12-0.1.jar to class loader default
23/10/19 13:56:11 INFO Executor: Fetching spark://10.0.10.10:34671/jars/hadoop-common-3.3.6.jar with timestamp 1697759762888
23/10/19 13:56:11 INFO Utils: Fetching spark://10.0.10.10:34671/jars/hadoop-common-3.3.6.jar to /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/fetchFileTemp10833889626368804294.tmp
23/10/19 13:56:11 INFO Utils: Copying /tmp/spark-7deed1d1-ceaf-4bd1-beff-61b8aab5b370/executor-5cda06d2-3c85-4007-8413-712568781695/spark-b69cf0ba-b08d-4800-9178-addf94b5a015/-18872084561697759762888_cache to /home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./hadoop-common-3.3.6.jar
23/10/19 13:56:11 INFO Executor: Adding file:/home/pav/spark/spark-3.5.0-bin-hadoop3/work/app-20231019135603-0098/4/./hadoop-common-3.3.6.jar to class loader default
23/10/19 13:56:12 INFO CoarseGrainedExecutorBackend: Got assigned task 3
23/10/19 13:56:12 INFO Executor: Running task 0.3 in stage 0.0 (TID 3)
23/10/19 13:56:12 INFO TorrentBroadcast: Started reading broadcast variable 0 with 1 pieces (estimated total size 4.0 MiB)
23/10/19 13:56:12 INFO TransportClientFactory: Successfully created connection to /10.0.10.10:37983 after 0 ms (0 ms spent in bootstraps)
23/10/19 13:56:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 37.9 KiB, free 90.0 MiB)
23/10/19 13:56:12 INFO TorrentBroadcast: Reading broadcast variable 0 took 30 ms
23/10/19 13:56:12 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 105.1 KiB, free 89.9 MiB)
23/10/19 13:56:12 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
23/10/19 13:56:12 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
23/10/19 13:56:12 INFO MetricsSystemImpl: s3a-file-system metrics system started
23/10/19 13:56:14 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[#65,readingParquetFooters-ForkJoinPool-1-worker-1,5,main]
java.lang.NoSuchMethodError: 'java.lang.Object org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(org.apache.hadoop.fs.statistics.DurationTracker, org.apache.hadoop.util.functional.CallableRaisingIOE)'
    at org.apache.hadoop.fs.s3a.Invoker.onceTrackingDuration(Invoker.java:147)
    at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:282)
    at org.apache.hadoop.fs.s3a.S3AInputStream.lambda$lazySeek$1(S3AInputStream.java:435)
    at org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$3(Invoker.java:284)
    at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
gtlvzcf8

gtlvzcf81#

Apache Spark v3.5.0是用Hadoop 3.3.4版本构建的。
所以尝试使用:

  • hadoop-common版本3.3.4
  • hadoop-aws版本3.3.4
  • aws-java-sdk-bundle版本1.12.262(基于this

相关问题