python—在hadoop模式下运行mr.job”启动作业时出错,输入路径错误:文件不存在:

ds97pgxw  于 2021-05-31  发布在  Hadoop
关注(0)|答案(0)|浏览(234)

我使用wiki中的默认配置以伪分布式模式运行apachehadoop3.1.0。
我创建了一个简单的python程序,用于计算下面发布的dblp.xml文件中的article标记

from mrjob.job import MRJob
import sys

class MRArticleCount(MRJob):

    def mapper(self, _, line):
        yield "articles", line.count('</article>')

    def reducer(self, key, counts):
        yield key, sum(counts)

if __name__ == '__main__':
    MRArticleCount.run()

用命令运行它

python articleCounter.py -r hadoop hdfs:///user/hadoop/dblp/dblp.xml

退货

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in $PATH...
Found hadoop binary: /home/hadoop/hadoop/bin/hadoop
Using Hadoop version 3.1.0
Looking for Hadoop streaming jar in /home/hadoop/hadoop...
Found Hadoop streaming jar: /home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar
Creating temp directory /tmp/articleCounter.hadoop.20180416.013824.692915
Copying local files to hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/...
Running step 1 of 1...
  loaded properties from hadoop-metrics2.properties
  Scheduled Metric snapshot period at 10 second(s).
  JobTracker metrics system started
  JobTracker metrics system already initialized!
  Cleaning up the staging area file:/tmp/hadoop/mapred/staging/hadoop890329391/.staging/job_local890329391_0001
  Error launching job , bad input path : File does not exist: /tmp/hadoop/mapred/staging/hadoop890329391/.staging/job_local890329391_0001/files/articleCounter.py#articleCounter.py
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['/home/hadoop/hadoop/bin/hadoop', 'jar', '/home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar', '-files', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/articleCounter.py#articleCounter.py,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/mrjob.zip#mrjob.zip,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/hadoop/dblp/dblp.xml', '-output', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.013824.692915/output', '-mapper', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --mapper', '-reducer', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --reducer']' returned non-zero exit status 512

用冗长的语言运行它会让我想起这个怪物:

Looking for configs in /home/hadoop/.mrjob.conf
Looking for configs in /etc/mrjob.conf
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Active configuration:
{'bootstrap_mrjob': None,
 'bootstrap_spark': None,
 'check_input_paths': True,
 'cleanup': ['ALL'],
 'cleanup_on_failure': ['NONE'],
 'cmdenv': {},
 'hadoop_bin': None,
 'hadoop_extra_args': [],
 'hadoop_log_dirs': [],
 'hadoop_streaming_jar': None,
 'hadoop_tmp_dir': 'tmp/mrjob',
 'interpreter': None,
 'jobconf': {},
 'label': None,
 'libjars': [],
 'local_tmp_dir': '/tmp',
 'owner': 'hadoop',
 'py_files': [],
 'python_bin': None,
 'setup': [],
 'sh_bin': ['sh', '-ex'],
 'spark_args': [],
 'spark_master': 'yarn',
 'spark_submit_bin': None,
 'steps_interpreter': None,
 'steps_python_bin': None,
 'task_python_bin': None,
 'upload_archives': [],
 'upload_dirs': [],
 'upload_files': []}
Looking for hadoop binary in $PATH...
Found hadoop binary: /home/hadoop/hadoop/bin/hadoop
> /home/hadoop/hadoop/bin/hadoop fs -ls hdfs:///user/hadoop/dblp/dblp.xml
STDOUT: -rw-r--r--   1 hadoop supergroup 2257949018 2018-04-15 04:23 hdfs:///user/hadoop/dblp/dblp.xml
> /home/hadoop/hadoop/bin/hadoop version
Using Hadoop version 3.1.0
> /usr/bin/python /home/hadoop/articleCounter.py --steps
Looking for Hadoop streaming jar in /home/hadoop/hadoop...
Found Hadoop streaming jar: /home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar
Creating temp directory /tmp/articleCounter.hadoop.20180416.014112.103990
archiving /home/hadoop/.local/lib/python2.7/site-packages/mrjob -> /tmp/articleCounter.hadoop.20180416.014112.103990/mrjob.zip as mrjob/
Writing wrapper script to /tmp/articleCounter.hadoop.20180416.014112.103990/setup-wrapper.sh
WRAPPER: # store $PWD
WRAPPER: __mrjob_PWD=$PWD
WRAPPER: 
WRAPPER: # obtain exclusive file lock
WRAPPER: exec 9>/tmp/wrapper.lock.articleCounter.hadoop.20180416.014112.103990
WRAPPER: python -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
WRAPPER: 
WRAPPER: # setup commands
WRAPPER: {
WRAPPER:   export PYTHONPATH=$__mrjob_PWD/mrjob.zip:$PYTHONPATH
WRAPPER: } 0</dev/null 1>&2
WRAPPER: 
WRAPPER: # release exclusive file lock
WRAPPER: exec 9>&-
WRAPPER: 
WRAPPER: # run task from the original working directory
WRAPPER: cd $__mrjob_PWD
WRAPPER: "$@"
> /home/hadoop/hadoop/bin/hadoop fs -mkdir -p hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/
Copying local files to hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/...
  /tmp/articleCounter.hadoop.20180416.014112.103990/mrjob.zip -> hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip
> /home/hadoop/hadoop/bin/hadoop fs -put /tmp/articleCounter.hadoop.20180416.014112.103990/mrjob.zip hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip
  /home/hadoop/articleCounter.py -> hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py
> /home/hadoop/hadoop/bin/hadoop fs -put /home/hadoop/articleCounter.py hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py
  /tmp/articleCounter.hadoop.20180416.014112.103990/setup-wrapper.sh -> hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh
> /home/hadoop/hadoop/bin/hadoop fs -put /tmp/articleCounter.hadoop.20180416.014112.103990/setup-wrapper.sh hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh
Running step 1 of 1...
> /home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar -files 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py#articleCounter.py,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip#mrjob.zip,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh#setup-wrapper.sh' -input hdfs:///user/hadoop/dblp/dblp.xml -output hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/output -mapper 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --mapper' -reducer 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --reducer'
  with environment: [('HOME', '/home/hadoop'), ('LANG', 'C'), ('LOGNAME', 'hadoop'), ('LS_COLORS', 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:'), ('MAIL', '/var/mail/hadoop'), ('OLDPWD', '/home/hadoop/hadoop/share/hadoop/tools/lib'), ('PATH', '/home/hadoop/hadoop/bin:/home/hadoop/hadoop/sbin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games'), ('PWD', '/home/hadoop'), ('SHELL', '/bin/bash'), ('SHLVL', '1'), ('SSH_CLIENT', '192.168.1.188 40594 22'), ('SSH_CONNECTION', '192.168.1.188 40594 192.168.1.150 22'), ('SSH_TTY', '/dev/pts/2'), ('TERM', 'xterm-256color'), ('USER', 'hadoop'), ('XDG_RUNTIME_DIR', '/run/user/1000'), ('XDG_SESSION_ID', '18543'), ('_', '/usr/bin/python')]
Invoking Hadoop via PTY
  loaded properties from hadoop-metrics2.properties
  Scheduled Metric snapshot period at 10 second(s).
  JobTracker metrics system started
  JobTracker metrics system already initialized!
  Cleaning up the staging area file:/tmp/hadoop/mapred/staging/hadoop108797154/.staging/job_local108797154_0001
  Error launching job , bad input path : File does not exist: /tmp/hadoop/mapred/staging/hadoop108797154/.staging/job_local108797154_0001/files/articleCounter.py#articleCounter.py
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['/home/hadoop/hadoop/bin/hadoop', 'jar', '/home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.0.jar', '-files', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/articleCounter.py#articleCounter.py,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/mrjob.zip#mrjob.zip,hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/hadoop/dblp/dblp.xml', '-output', 'hdfs:///user/hadoop/tmp/mrjob/articleCounter.hadoop.20180416.014112.103990/output', '-mapper', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --mapper', '-reducer', 'sh -ex setup-wrapper.sh python articleCounter.py --step-num=0 --reducer']' returned non-zero exit status 512

程序本身与测试数据集内联运行得非常好,但使用hadoop运行程序失败。我认为问题在于启动作业时输入路径不好,但我不知道如何解决这个问题。任何帮助将不胜感激,我将很高兴提供任何配置文件或日志,将有助于解决这个问题!
谢谢!

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题