我面临这个错误问题流命令失败!尝试对某些数据运行mapper.py和reducer.py时出错。mapper和reducer运行,但流失败。
这是Map程序代码
# !/usr/bin/python
import sys
for line in sys.stdin:
data = line.strip().split(",")
key = data[0]
value = 1
print ("{0}\t{1}".format(key, value) )
这是减速机代码
# !/usr/bin/python
import sys
total = 0
oldkey = None
for line in sys.stdin:
data = line.strip().split("\t")
thiskey = data[0]
value = data[1]
if thiskey != oldkey and oldkey != None:
print ("{0}\t{1}".format(oldkey, total))
oldkey = thiskey
total = 0
oldkey = thiskey
total += float(value)
if oldkey != None:
print ("{0}\t{1}".format(oldkey, total))
这是im在terminam中运行的命令,用于对数据执行mapper和reducer。
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar -file ./mapper.py -file ./reducer.py -mapper mapper.py -reducer reducer.py -input /usr/bda-p101234/airline_data.csv -output /usr/bda-p101234/query1_output
2020-06-23 17:05:39,399 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [./mapper.py, ./reducer.py, /tmp/hadoop-unjar2898662668096241827/] [] /tmp/streamjob7927337112790687471.jar tmpDir=null
2020-06-23 17:05:40,209 INFO client.RMProxy: Connecting to ResourceManager at lmar/192.168.18.100:8032
2020-06-23 17:05:40,379 INFO client.RMProxy: Connecting to ResourceManager at lmar/192.168.18.100:8032
2020-06-23 17:05:40,736 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/bda-p190311/.staging/job_1592885073926_0001
2020-06-23 17:05:40,885 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 17:05:41,585 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 17:05:41,627 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-23 17:05:41,713 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/bda-p190311/.staging/job_1592885073926_0001
2020-06-23 17:05:41,736 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: hdfs://lmar:9000/usr/bda-p101234/airline_data.csv
Streaming Command Failed!
我做错了什么??请帮帮我!
暂无答案!
目前还没有任何答案,快来回答吧!