mapreduce没有联系rmproxy并在等待resourcemanager时卡住?

olmpazwi  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(318)

我正在使用hadoop2.7.3在emr上运行mapreduce/hadoop。在aws上安装,jar是用maven shade插件构建的。它在等待资源管理器时被无限地卡住了,但我在日志文件或在线上完全找不到任何东西。
job.waitForCompletion ,它包括以下几行:

020-01-25 05:52:41,346 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl (main): Timeline service address: http://ip-172-31-13-41.us-west-2.compute.internal:8188/ws/v1/timeline/
2020-01-25 05:52:41,356 INFO org.apache.hadoop.yarn.client.RMProxy (main): Connecting to ResourceManager at ip-172-31-13-41.us-west-2.compute.internal/172.31.13.41:8032

然后它就坐在那里。。。从不取得进展,必须关闭集群或手动终止任务。
有趣的是 hadoop jar <arguments> ,我可以在本地重现这一步,但我不知道是什么原因造成的。
大约25分钟后,它在打开jar时失败:

After 25 minutes or so, the job produces output of the form:

AM Container for appattempt_1580058321574_0005_000001 exited with exitCode: -1000
For more detailed output, check application tracking page:http://192.168.2.21:8088/cluster/app/application_1580058321574_0005Then, click on links to logs of each attempt.
Diagnostics: /Users/gbronner/hadoopdata/yarn/local/usercache/gbronner/appcache/application_1580058321574_0005/filecache/11_tmp/tmp_job.jar (Is a directory)
java.io.FileNotFoundException: /Users/gbronner/hadoopdata/yarn/local/usercache/gbronner/appcache/application_1580058321574_0005/filecache/11_tmp/tmp_job.jar (Is a directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:225)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:94)
at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:297)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt

这在aws emr和本地都会发生。从未见过这种错误,使用电子病历直接开箱即用。
你知道为什么会这样吗?坏jar?可能与另一个未回答的问题有关

ehxuflar

ehxuflar1#

经过上百个实验的反复尝试,这条令人不快的线似乎是正确的
job.setjar()。
为什么,我不知道。它在intellij下运行良好,但使用 hadoop 本地和intellij下的命令。

相关问题