Yarn作业上的spark失败exitcode:1 and stderr说“找不到主类”

tv6aics1  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(416)

我们试图提交一个简单的sparkpi例子到spark-on-warn上。这个 bat 具体内容如下:

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 1g --executor-cores 1 .\examples\target\spark-examples_2.10-1.4.0.jar 10
pause

我们的hdfs和Yarn效果很好。我们使用的是hadoop2.7.0和spark1.4.1。我们只有一个节点同时充当namenode和datanode。
当我们执行它时,它失败了,日志显示如下:

2015-08-21 11:07:22,044 DEBUG [main] | ===============================================================================
2015-08-21 11:07:22,044 DEBUG [main] | Yarn AM launch context:
2015-08-21 11:07:22,044 DEBUG [main] |     user class: org.apache.spark.examples.SparkPi
2015-08-21 11:07:22,044 DEBUG [main] |     env:
2015-08-21 11:07:22,044 DEBUG [main] |         CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__hadoop_conf__<CPS>{{PWD}}/__spark__.jar<CPS>%HADOOP_HOME%\etc\hadoop<CPS>%HADOOP_HOME%\share\hadoop\common\*<CPS>%HADOOP_HOME%\share\hadoop\common\lib\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\lib\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\lib\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\lib\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES_FILE_SIZES -> 165181064,1420218
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1440062075415_0026
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_USER -> msrabi
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_MODE -> true
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1440126441200,1440126441575
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES -> hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar#__spark__.jar,hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar#__app__.jar
2015-08-21 11:07:22,060 DEBUG [main] |     resources:
2015-08-21 11:07:22,060 DEBUG [main] |         __app__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar" } size: 1420218 timestamp: 1440126441575 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |         __spark__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar" } size: 165181064 timestamp: 1440126441200 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |         __hadoop_conf__ -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/__hadoop_conf__7908628615251032149.zip" } size: 82888 timestamp: 1440126441794 type: ARCHIVE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |     command:
2015-08-21 11:07:22,075 DEBUG [main] |         {{JAVA_HOME}}/bin/java -server -Xmx4096m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.app.name=org.apache.spark.examples.SparkPi' '-Dspark.executor.memory=1g' '-Dspark.driver.memory=4g' '-Dspark.master=yarn-cluster' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar file:/D:/sp/./examples/target/spark-examples_2.10-1.4.0.jar --arg '10' --executor-memory 1024m --executor-cores 1 --num-executors  3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
2015-08-21 11:07:22,075 DEBUG [main] | ===============================================================================

...........(omitting some lines)......

2015-08-21 11:07:23,231 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)
2015-08-21 11:07:23,247 DEBUG [main] | 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1440126442169
     final status: UNDEFINED
     tracking URL: http://msra-sa-44:8088/proxy/application_1440062075415_0026/
     user: msrabi
2015-08-21 11:07:24,263 TRACE [main] | 1: Call -> MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_id { id: 26 cluster_timestamp: 1440062075415 }}
2015-08-21 11:07:24,263 DEBUG [IPC Parameter Sending Thread #0] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi sending #37
2015-08-21 11:07:24,263 DEBUG [IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi got value #37
2015-08-21 11:07:24,263 DEBUG [main] | Call: getApplicationReport took 0ms
2015-08-21 11:07:24,263 TRACE [main] | 1: Response <- MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_report { applicationId { id: 26 cluster_timestamp: 1440062075415 } user: "msrabi" queue: "default" name: "org.apache.spark.examples.SparkPi" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED trackingUrl: "http://msra-sa-44:8088/proxy/application_1440062075415_0026/" diagnostics: "" startTime: 1440126442169 finishTime: 0 final_application_status: APP_UNDEFINED app_resource_Usage { num_used_containers: 1 num_reserved_containers: 0 used_resources { memory: 4608 virtual_cores: 1 } reserved_resources { memory: 0 virtual_cores: 0 } needed_resources { memory: 4608 virtual_cores: 1 } memory_seconds: 0 vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId { application_id { id: 26 cluster_timestamp: 1440062075415 } attemptId: 1 } progress: 0.0 applicationType: "SPARK" }}
2015-08-21 11:07:24,263 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)

.......(omitting some lines where the state are all ACCEPTED and final status are all UNDEFINED).....

2015-08-21 11:07:30,359 INFO [main] | Application report for application_1440062075415_0026 (state: FAILED)
2015-08-21 11:07:30,359 DEBUG [main] | 
     client token: N/A
     diagnostics: Application application_1440062075415_0026 failed 2 times due to AM Container for appattempt_1440062075415_0026_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://msra-sa-44:8088/cluster/app/application_1440062075415_0026Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1440062075415_0026_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
    at org.apache.hadoop.util.Shell.run(Shell.java:456)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

Shell output:         1 file(s) moved.

然后我们打开了 stderr ,上面写着:

Error: Could not find or load main class 'Dspark.app.name=org.apache.spark.examples.SparkPi'

太奇怪了,这应该是传递给 java ,看来 java 把它认作主课。中应该有一个主类参数 command 日志的一部分,但没有。
怎么会这样?我们该怎么做才能知道它出了什么问题?
谢谢您!

dnph8jn4

dnph8jn41#

我们解决了这个问题。
根本原因是在生成 java 在命令行中,我们的spark使用单引号('-dx')来 Package 参数。单引号仅适用于linux。在windows上,参数要么不换行,要么用双引号(“-dx”)换行。解决这个问题的唯一方法是编辑spark的源代码并重新编译。
这似乎是目前的一个问题的Spark(https://issues.apache.org/jira/browse/spark-5754)

相关问题