r+hadoop和rhadoop作业在单机集群上失败

egmofgnx  于 2021-06-03  发布在  Hadoop
关注(0)|答案(4)|浏览(403)

为自己是个新手,可能会问一些愚蠢的问题提前道歉。我已经在单机集群上安装了hadoop(ubuntu14.04),并成功地测试了apache安装指南中指定的非常基本的程序。随后,我安装了r、rstudio以及rhdfs、rmr2和所有依赖项包。
然后我尝试运行以下程序:

Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar")
library('rhdfs')
library('rmr2')
hdfs.init()
small.ints = to.dfs(1:10)
mapreduce(
  input = small.ints, 
  map = function(k, v)
  {
    lapply(seq_along(v), function(r){
      x <- runif(v[[r]])
      keyval(r,c(max(x),min(x)))
    })})

作业失败,控制台上的输出如下

packageJobJar: [/tmp/RtmprPBBS1/rmr-local-env242520fb4125, /tmp/RtmprPBBS1/rmr-global-env24252518202b, /tmp/RtmprPBBS1/rmr-streaming-map24255b97931e, /tmp/hadoop-hduser/hadoop-unjar4430970496737933525/] [] /tmp/streamjob6651310557292596411.jar tmpDir=null
14/05/05 09:16:08 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/05 09:16:08 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hduser/mapred/local]
14/05/05 09:16:08 INFO streaming.StreamJob: Running job: job_201405050557_0013
14/05/05 09:16:08 INFO streaming.StreamJob: To kill this job, run:
14/05/05 09:16:08 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201405050557_0013
14/05/05 09:16:08 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201405050557_0013
14/05/05 09:16:09 INFO streaming.StreamJob:  map 0%  reduce 0%
14/05/05 09:16:41 INFO streaming.StreamJob:  map 100%  reduce 100%
14/05/05 09:16:41 INFO streaming.StreamJob: To kill this job, run:
14/05/05 09:16:41 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201405050557_0013
14/05/05 09:16:41 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201405050557_0013
14/05/05 09:16:41 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201405050557_0013_m_000001
14/05/05 09:16:41 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

stderror日志如下

Error in library(functional) : there is no package called ‘functional’
No traceback available 
Error during wrapup: 
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

我尝试了其他一些简单的演示程序,结果是一样的。看来问题出在我的配置上。
“functional”包已安装,正在自动加载。即使手动加载,也无济于事。所以这很可能不是问题所在。
任何帮助或建议我都会欣然接受。
我在ubuntu14.04上以单集群模式运行hadoop1.2.1、r3.0.5和rstudio0.98.507 java is oracle7java版本1.7.0\u55
hadoop安装似乎还可以,因为我的常规wordcount程序运行良好。
即使是最简单的rhadoop演示,我也能得到相同的结果
这可能是我的机器容量有问题吗?在稍微高端的笔记本电脑上运行?2.8 gib内存和intel® 核心™ i3-2310m cpu@2.10ghz× 4处理器
我现在已经转到hadoop2.2.0,并使用本教程成功地安装了hadoop2.2.0。计算pi的演示程序执行时没有错误。
然后我执行了这个非常简单的mr程序

Sys.setenv(HADOOP_CMD="/usr/local/hadoop220/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop220/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar")
library('rhdfs')
library('rmr2')
library('functional')
hdfs.init()
small.ints = to.dfs(1:10)
mapreduce(
  input = small.ints, 
  map = function(k, v) cbind(v, v^2))

程序执行到第7行,但在所有重要的mr步骤中失败,出现以下错误[仅显示错误的最后一部分]

14/05/06 13:53:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:37 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/06 13:53:37 INFO mapreduce.JobSubmitter: number of splits:2
14/05/06 13:53:37 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/05/06 13:53:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399363749415_0002
14/05/06 13:53:38 INFO impl.YarnClientImpl: Submitted application application_1399363749415_0002 to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:38 INFO mapreduce.Job: The url to track the job: http://yantrajaal:8088/proxy/application_1399363749415_0002/
14/05/06 13:53:38 INFO mapreduce.Job: Running job: job_1399363749415_0002
14/05/06 13:53:45 INFO mapreduce.Job: Job job_1399363749415_0002 running in uber mode : false
14/05/06 13:53:45 INFO mapreduce.Job:  map 0% reduce 0%
14/05/06 13:53:57 INFO mapreduce.Job:  map 100% reduce 0%
14/05/06 13:53:57 INFO mapreduce.Job: Task Id : attempt_1399363749415_0002_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

            ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

14/05/06 13:54:31 INFO mapreduce.Job:  map 100% reduce 0%
14/05/06 13:54:32 INFO mapreduce.Job: Job job_1399363749415_0002 failed with state FAILED due to: Task failed task_1399363749415_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

14/05/06 13:54:32 INFO mapreduce.Job: Counters: 10
    Job Counters 
        Failed map tasks=7
        Killed map tasks=1
        Launched map tasks=8
        Other local map tasks=6
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=72476
        Total time spent by all reduces in occupied slots (ms)=0
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
14/05/06 13:54:32 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

我真不知下一步该怎么办!
如有任何关于前进道路的建议,我们将不胜感激。我的怀疑是rhadoop可能还不能适应Ubuntu14.04,但这只是猜测

zdwk9cvp

zdwk9cvp1#

问题是,当您以非根用户的身份安装包时,它们最终会出现在一个私有目录中。这就是所有问题的原因。解决方案是以root用户或超级用户身份登录,然后安装软件包,使它们最终进入系统范围的r库,在我的示例中是/usr/lib64/r/library。在此之后,就没有任何问题了。程序会起作用的!

sxpgvts3

sxpgvts32#

您的单机集群上的r设置似乎有错误。
集群上是否安装了r包?

a7qyws3x

a7qyws3x3#

我用下面的方法解决了和你类似的问题。
看看你的r库

.libPaths()

使用以下命令检查已安装到哪个库包函数:

system.file(package="functional")

如果它安装在个人库中,而不是安装在所有用户共用的库中,作业将失败,并错误地表示无法加载包。
希望这会有帮助。
干杯
赵延长
rdatamining.com

v6ylcynt

v6ylcynt4#

启动您的终端并使用超级用户或root用户登录 sudo su root 那么 start R 在终端中,使用以下命令安装rhadoop包
install.packages(c(“codetools”,“r”,“rcpp”,“rjsonio”,“bitops”,“digest”,“functional”,“stringr”,“plyr”,“reformae2”,“rjava”))install.packages(c(“dplyr”,“r.methodss3”))install.packages(c(“hmisc”))install.packages(c(“catools”))sys.setenv(hadoop\u home=“/usr/local/hadoop”)sys.setenv(hadoop\u cmd=“/usr/local/hadoop/bin/hadoop”) Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoopversiomentionhere.jar") 之后安装 rmr2 rhdfs2 在这里
之后,使用以下命令安装这些下载的源文件 install.packages(path_to_file, repos = NULL, type="source") 现在安装后,关闭端子r,然后打开端子 rstudio 运行r代码流错误将得到解决,因为上述步骤将在全局文件夹中安装r库。
或者如果你想你可以安装r本身作为一个超级用户为安全的一方希望这有帮助

相关问题