如何使用hdinsight.netsdk提交mahout推荐作业

bxgwgixi  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(372)

我是新来的。我想学习和实践机器学习,hdinsight正是我想要的,但似乎没有直接的api来mahout。由于mahout建议基本上会转换为mapredure作业,因此我在windows azure文档中遵循了一些mapreduce示例,并编写了以下代码:

// Define the MapReduce job
MapReduceJobCreateParameters mrJobDefinition = new MapReduceJobCreateParameters()
{
    JarFile = "wasb:///example/jars/mahout-core-0.9-job.jar",
    ClassName = "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob",
};

mrJobDefinition.Arguments.Add(" -s SIMILARITY_COOCCURRENCE");
mrJobDefinition.Arguments.Add(" --input=/reply");
mrJobDefinition.Arguments.Add(" --output=/recommend/");
mrJobDefinition.Arguments.Add(" --usersFile=/data/users.txt");

我已经将“mahout-core-0.9-job.jar”上传到指定的azureblob存储容器中的/example/jars。
但我收到了以下错误消息:
14/04/03 12:04:28错误security.usergroupinformation:priviledgedactionexception as:johnnycause:java.io.ioexception:异常读取文件:/c:/apps/temp/hdfs/mapred/local/tasktracker/johnny/jobcache/job\u 201404031203\u 0001/jobtoken=java.security.privilegedactionexception:java.io.ioexception:异常读取文件:/c:/apps/temp/hdfs/mapred/local/tasktracker/johnny/jobcache/job\u 201404031203\u 0001/jobtoken=at java.security.accesscontroller.doprivileged(本机方法)at javax.security.auth.subject.doas(主题)。java:415)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1233)在org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient。java:951)在org.apache.hadoop.mapreduce.job.submit(作业。java:550)在org.apache.hadoop.mapreduce.job.waitforcompletion(作业。java:580)在org.apache.mahout.cf.taste.hadoop.preparation.preparePreferenceMatrix.run(preparePreferenceMatrix作业)。java:77)在org.apache.hadoop.util.toolrunner.run(toolrunner。java:65)在org.apache.mahout.cf.taste.hadoop.item.recommenderjob.run(recommenderjob。java:164)在org.apache.hadoop.util.toolrunner.run(toolrunner。java:65)在org.apache.mahout.cf.taste.hadoop.item.recommenderjob.main(recommenderjob。java:322)在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:601)在org.apache.hadoop.util.runjar.main(runjar。java:160)原因:java.io.ioexception:异常读取文件:/c:/apps/temp/hdfs/mapred/local/tasktracker/johnny/jobcache/job\u 201404031203\u 0001/jobtoken=org.apache.hadoop.security.credentials.readtokenstoragefile(凭证)。java:136)在org.apache.hadoop.mapred.jobclient.readtokensfromfiles(jobclient。java:2149)在org.apache.hadoop.mapred.jobclient.populatetokencache(jobclient。java:2185)在org.apache.hadoop.mapred.jobclient.access$300(jobclient。java:179)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:964)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:951) ... 另外16个原因:java.io.filenotfoundexception:file file:/c:/apps/temp/hdfs/mapred/local/tasktracker/johnny/jobcache/job\u 201404031203\u 0001/jobtoken=不存在。位于org.apache.hadoop.fs.rawlocalfilesystem.getfilestatus(rawlocalfilesystem)。java:427)在org.apache.hadoop.fs.filterfilesystem.getfilestatus(filterfilesystem。java:254)在org.apache.hadoop.fs.checksumfisystem$checksumfsinputchecker.(checksumfisystem。java:125)在org.apache.hadoop.fs.checksumfilesystem.open(checksumfilesystem。java:283)在org.apache.hadoop.fs.filesystem.open(文件系统)。java:436)在org.apache.hadoop.security.credentials.readtokenstoragefile(credentials。java:130) ... 21线程“main”java.io.ioexception中有更多异常:异常读取文件:/c:/apps/temp/hdfs/mapred/local/tasktracker/johnny/jobcache/job\u 201404031203\u 0001/jobtoken=org.apache.hadoop.security.credentials.readtokenstoragefile(credentials)。java:136)在org.apache.hadoop.mapred.jobclient.readtokensfromfiles(jobclient。java:2149)在org.apache.hadoop.mapred.jobclient.populatetokencache(jobclient。java:2185)访问org.apache.hadoop.mapred.jobclient.access$300(jobclient。java:179)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:964)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:951)位于java.security.accesscontroller.doprivileged(本机方法)javax.security.auth.subject.doas(主题。java:415)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1233)在org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient。java:951)在org.apache.hadoop.mapreduce.job.submit(作业。java:550)在org.apache.hadoop.mapreduce.job.waitforcompletion(作业。java:580)在org.apache.mahout.cf.taste.hadoop.preparation.preparePreferenceMatrix.run(preparePreferenceMatrix作业)。java:77)在org.apache.hadoop.util.toolrunner.run(toolrunner。java:65)在org.apache.mahout.cf.taste.hadoop.item.recommenderjob.run(recommenderjob。java:164)在org.apache.hadoop.util.toolrunner.run(toolrunner。java:65)在org.apache.mahout.cf.taste.hadoop.item.recommenderjob.main(推荐作业)。java:322)在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl)。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:601)在org.apache.hadoop.util.runjar.main(runjar。java:160)原因:java.io.filenotfoundexception:file file:/c:/apps/temp/hdfs/mapred/local/tasktracker/johnny/jobcache/job\u 201404031203\u 0001/jobtoken=不存在。位于org.apache.hadoop.fs.rawlocalfilesystem.getfilestatus(rawlocalfilesystem)。java:427)在org.apache.hadoop.fs.filterfilesystem.getfilestatus(filterfilesystem。java:254)在org.apache.hadoop.fs.checksumfisystem$checksumfsinputchecker.(checksumfisystem。java:125)在org.apache.hadoop.fs.checksumfilesystem.open(checksumfilesystem。java:283)在org.apache.hadoop.fs.filesystem.open(文件系统)。java:436)在org.apache.hadoop.security.credentials.readtokenstoragefile(credentials。java:130) ... 21又强制关闭了watcher/keep-alive线程池templeton:作业失败,退出代码为1
我在互联网上搜索之后,似乎应该对mapred-site.xml或其他hadoop配置文件进行一些更改。但是我对apache hadoop完全陌生,对linux和java不太了解。
任何帮助或指导都将不胜感激。

vybvopom

vybvopom1#

使用最新的.net sdk for hadoop(http://hadoopsdk.codeplex.com/),我可以成功提交相同代码的mahout作业。看来这个问题已经被sdk解决了。

相关问题