我正在学习rhadoop教程,https://github.com/revolutionanalytics/rmr2/blob/master/docs/tutorial.md 运行第二个例子,但是我遇到了一些无法解决的错误。代码如下:
groups = rbinom(32,n=50,prob=0.4)
groupsdfs =to.dfs(groups)
mapreduceResult<- mapreduce(
input =groupsdfs,
map =function(.,v) keyval(v,1),
reduce = function(k,vv) keyval(k,sum(vv)))
from.dfs(mapreduceResult)
Map作业成功,但reduce job失败,部分错误消息如下:
14/07/24 11:22:59 INFO mapreduce.Job: map 100% reduce 58%
14/07/24 11:23:01 INFO mapreduce.Job: Task Id : attempt_1406189659246_0001_r_000016_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
... 14 more
Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 15 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
... 16 more
14/07/24 11:23:42 INFO mapreduce.Job: Job job_1406189659246_0001 failed with state FAILED due to: Task failed task_1406189659246_0001_r_000007
作业失败,因为任务失败。failedmaps:0 failedreduces:1
14/07/24 11:23:42 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=1631
FILE: Number of bytes written=2036200
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1073
HDFS: Number of bytes written=5198
HDFS: Number of read operations=67
HDFS: Number of large read operations=0
HDFS: Number of write operations=38
Job Counters
Failed map tasks=2
Failed reduce tasks=28
Killed reduce tasks=1
Launched map tasks=4
Launched reduce tasks=48
Other local map tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=18216
Total time spent by all reduces in occupied slots (ms)=194311
Total time spent by all map tasks (ms)=18216
Total time spent by all reduce tasks (ms)=194311
Total vcore-seconds taken by all map tasks=18216
Total vcore-seconds taken by all reduce tasks=194311
Total megabyte-seconds taken by all map tasks=18653184
Total megabyte-seconds taken by all reduce tasks=198974464
Map-Reduce Framework
Map input records=3
Map output records=25
Map output bytes=2196
Map output materialized bytes=2266
Input split bytes=214
Combine input records=0
Combine output records=0
Reduce input groups=10
Reduce shuffle bytes=1859
Reduce input records=21
Reduce output records=30
Spilled Records=46
Shuffled Maps =38
Failed Shuffles=0
Merged Map outputs=38
GC time elapsed (ms)=1339
CPU time spent (ms)=40060
Physical memory (bytes) snapshot=5958418432
Virtual memory (bytes) snapshot=33795457024
Total committed heap usage (bytes)=7176978432
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=859
File Output Format Counters
Bytes Written=5198
rmr
reduce calls=10
14/07/24 11:23:42 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
有人能帮忙吗?我不能再往前走了。谢谢。
2条答案
按热度按时间uyhoqukh1#
问题解决了。与r和rhadoop相关的包需要安装在集群中的所有节点上。对于rhadoop问题,最好在他们的google组中发布https://groups.google.com/forum/#!论坛/rhadoop,你可以很快得到一些提示。
8ljdwjyq2#
这是wordcount的工作示例(在cloudera sandbox 4.6/5/5.1上运行)重要的是开头的init!;)