在python中hadoop流作业失败(不成功)

因此，当我运行cat england.txt |./mapperengl.py | sort |./reducerengl.py时，我的脚本工作得非常好
但是当我跑步时：
/shared/hadoop/cur/bin/hadoop-jar/shared/hadoop/cur/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar-file/home/hadoop/mapperengl.py-mapper/home/hadoop/mapperengl.py-file/home/hadoop/reducerengl.py-reducer/home/hadoop/reducerengl.py-input/datadir/england.txt-output/outputdir/climateresults3.txt
我得到以下错误：
16/05/03 09:27:15 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
16/05/03 09:27:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/home/hadoop/mapperEngl.py, /home/hadoop/reducerEngl.py, /tmp/hadoop-unjar6814867016081507297/] [] /tmp/streamjob1585723008278678599.jar tmpDir=null
16/05/03 09:27:16 INFO client.RMProxy: Connecting to ResourceManager at mgmt-florida-poly-eth0/10.200.209.10:8032
16/05/03 09:27:16 INFO client.RMProxy: Connecting to ResourceManager at mgmt-florida-poly-eth0/10.200.209.10:8032
16/05/03 09:27:17 INFO mapred.FileInputFormat: Total input paths to process : 1
16/05/03 09:27:17 INFO mapreduce.JobSubmitter: number of splits:2
16/05/03 09:27:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459438007195_0006
16/05/03 09:27:17 INFO impl.YarnClientImpl: Submitted application application_1459438007195_0006
16/05/03 09:27:17 INFO mapreduce.Job: The url to track the job: http://mgmt-florida-poly-eth0:8088/proxy/application_1459438007195_0006/
16/05/03 09:27:17 INFO mapreduce.Job: Running job: job_1459438007195_0006
16/05/03 09:27:25 INFO mapreduce.Job: Job job_1459438007195_0006 running in uber mode : false
16/05/03 09:27:25 INFO mapreduce.Job:  map 0% reduce 0%
16/05/03 09:27:31 INFO mapreduce.Job:  map 50% reduce 0%
16/05/03 09:27:32 INFO mapreduce.Job:  map 100% reduce 0%
16/05/03 09:27:38 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/05/03 09:27:45 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/05/03 09:27:51 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/05/03 09:27:58 INFO mapreduce.Job:  map 100% reduce 100%
16/05/03 09:27:58 INFO mapreduce.Job: Job job_1459438007195_0006 failed with state FAILED due to: Task failed task_1459438007195_0006_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

16/05/03 09:27:58 INFO mapreduce.Job: Counters: 37
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=228560
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=29265
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters
                Failed reduce tasks=4
                Launched map tasks=2
                Launched reduce tasks=4
                Rack-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=134880
                Total time spent by all reduces in occupied slots (ms)=242432
                Total time spent by all map tasks (ms)=8430
                Total time spent by all reduce tasks (ms)=15152
                Total vcore-seconds taken by all map tasks=8430
                Total vcore-seconds taken by all reduce tasks=15152
                Total megabyte-seconds taken by all map tasks=17264640
                Total megabyte-seconds taken by all reduce tasks=31031296
        Map-Reduce Framework
                Map input records=107
                Map output records=223
                Map output bytes=9014
                Map output materialized bytes=9472
                Input split bytes=202
                Combine input records=0
                Spilled Records=223
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                CPU time spent (ms)=1540
                Physical memory (bytes) snapshot=1305165824
                Virtual memory (bytes) snapshot=5482422272
                Total committed heap usage (bytes)=2022440960
        File Input Format Counters
                Bytes Read=29063
16/05/03 09:27:58 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
[hadoop@mgmt-florida-poly ~]$
我试过其他问题的解决方法，但似乎不管用。
是啊，完全困在这里了。
在python中hadoop流作业失败(不成功)

1条答案

相关问题

热门标签

最新问答