我已经用python为wordcount程序编写了mapper和reducer,运行良好。以下是一个示例:
echo "hello hello world here hello here world here hello" | wordmapper.py | sort -k1,1 | wordreducer.py
hello 4
here 3
world 2
现在,当我尝试为一个大文件提交hadoop作业时,会出现错误
hadoop jar share/hadoop/tools/sources/hadoop-*streaming*.jar -file wordmapper.py -mapper wordmapper.py -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: share.hadoop.tools.sources.hadoop-streaming-2.2.0-test-sources.jar
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
我删除了以下命令行(从上面删除了通配符);
hadoop jar share/hadoop/tools/sources/hadoop-streaming-2.2.0-sources.jar -file wordmapper.py -mapper wordmapper.py -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: -file
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
为什么会出现这些错误以及如何修复这些错误?我用 hadoop2.
谢谢!
1条答案
按热度按时间8oomwypt1#
至少你的问题之一是你在使用
-sources.jar
这只是.java
无法执行。试着用这个。。。
如果不存在的话,找一个
hadoop-streaming*.jar
那不需要-sources
在文件名中。