我在hadoop流媒体中运行一个简单的python代码时遇到了一个问题。我尝试了所有的建议,在以前的职位与类似的错误,仍然有问题。
添加了usr/bin/env python
chmoda+xMap器和python代码
在python mapper.py-n 1-r 0.4中为-mapper放入“”
我已经在外面运行了代码,效果很好。
更新:我使用以下代码在hadoop流之外运行代码:
cat file |python mapper.py -n 5 -r 0.4 |sort|python reducer.py -f 3618
这很管用。。但是现在我需要运行它到hadoop流媒体
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
-D mapreduce.job.reduces=5 \
-files lr \
-mapper "python lr/mapper.py -n 5 -r 0.4" \
-reducer "python lr/reducer.py -f 3618" \
-input training \
-output models
hadoop流媒体是失败的。我看了看日志,上面没有任何东西告诉我为什么会这样?
我有以下mapper.py:
# !/usr/bin/env python
import sys
import random
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-n", "--model-num", action="store", dest="n_model",
help="number of models to train", type="int")
parser.add_option("-r", "--sample-ratio", action="store", dest="ratio",
help="ratio to sample for each ensemble", type="float")
options, args = parser.parse_args(sys.argv)
random.seed(8803)
r = options.ratio
for line in sys.stdin:
# TODO
# Note: The following lines are only there to help
# you get started (and to have a 'runnable' program).
# You may need to change some or all of the lines below.
# Follow the pseudocode given in the PDF.
key = random.randint(0, options.n_model-1)
value = line.strip()
for i in range(1, options.n_model+1):
m = random.random()
if m < r:
print "%d\t%s" % (i, value)
还有我的减速机.py:
# !/usr/bin/env python
import sys
import pickle
from optparse import OptionParser
from lrsgd import LogisticRegressionSGD
from utils import parse_svm_light_line
parser = OptionParser()
parser.add_option("-e", "--eta", action="store", dest="eta",
default=0.01, help="step size", type="float")
parser.add_option("-c", "--Regularization-Constant", action="store", dest="C",
default=0.0, help="regularization strength", type="float")
parser.add_option("-f", "--feature-num", action="store", dest="n_feature",
help="number of features", type="int")
options, args = parser.parse_args(sys.argv)
classifier = LogisticRegressionSGD(options.eta, options.C, options.n_feature)
for line in sys.stdin:
key, value = line.split("\t", 1)
value = value.strip()
X, y = parse_svm_light_line(value)
classifier.fit(X, y)
pickle.dump(classifier, sys.stdout)
当我在代码外运行它时,它运行正常,但当我在hadoop流媒体中运行它时,它会给我错误:
17/02/07 07:44:34 INFO mapreduce.Job: Task Id : attempt_1486438814591_0038_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
1条答案
按热度按时间zmeyuzjn1#
使用harishanker在文章中给出的答案-如何解析java.lang.runtimeexception:pipemapred.waitoutputthreads():子进程失败,代码为2?
确保mapper和reducer文件都可以使用chmod执行(例如:“chmod 744 mapper.py”)
然后执行如下流命令:
现在应该可以了。如果没有,请评论。