我正在学习hadoop,并编写了map/reduce步骤来处理一些avro文件。我想我的问题可能是由于我的hadoop安装。我尝试在笔记本电脑上以独立模式进行测试,而不是在分布式集群上。
下面是我运行作业的bash调用:
# !/bin/bash
reducer=/home/hduser/python-hadoop/test/reducer.py
mapper=/home/hduser/python-hadoop/test/mapper.py
avrohdjar=/home/hduser/python-hadoop/test/avro-mapred-1.7.4-hadoop1.jar
avrojar=/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar
hadoop jar ~/hadoop/share/hadoop/tools/lib/hadoop-streaming* \
-D mapreduce.job.name="hd1" \
-libjars ${avrojar},${avrohdjar} \
-files ${avrojar},${avrohdjar},${mapper},${reducer} \
-input ~/tmp/data/* \
-output ~/tmp/data-output \
-mapper ${mapper} \
-reducer ${reducer} \
-inputformat org.apache.avro.mapred.AvroAsTextInputFormat
输出结果如下:
15/04/23 11:02:54 INFO Configuration.deprecation: session.id is
deprecated. Instead, use dfs.metrics.session-id
15/04/23 11:02:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/04/23 11:02:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/04/23 11:02:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/home/hduser/tmp/mapred/staging/hduser1337717111/.staging/job_local1337717111_0001
15/04/23 11:02:54 ERROR streaming.StreamJob: Error launching job , bad input path : File does not exist: hdfs://localhost:54310/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar
Streaming Command Failed!
我试过很多不同的方法,但不知道下一步该怎么做。由于某些原因,hadoop找不到-libjars指定的jar文件。另外,我已经成功地运行了这里发布的wordcount示例,因此我的hadoop安装或配置可以很好地实现这一点。谢谢!
编辑这里是my hdfs-site.xml内容的更改
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
这里是core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
1条答案
按热度按时间7ajki6be1#
您的群集正在分布式模式下运行。它试图在下面的路径中查找输入,但该路径不存在。