如何在远程集群上从本地ide运行mapreduce程序

laximzn5  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(264)

我有一个简单的mapreduce程序,我想在远程集群上运行它。我可以从命令行运行

hadoop jar myjar.jar input output

但是,当我想从ide中运行junit testcase类中调用mr作业的函数时,会收到以下警告:

WARN org.apache.hadoop.mapreduce.JobSubmitter  - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
 INFO org.apache.hadoop.mapred.YARNRunner  - Job jar is not present. Not adding any jar to the list of resources.

虽然我设置了这一行,但在提交mr作业之前:

job.setJarByClass(MyJob.class);

因此作业失败,因为它找不到合适的类(比如mymapkey,它是mapper key类)来操作。

Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :java.lang.RuntimeException: java.lang.ClassNotFoundException: Class MyMapKey not found
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

有什么想法吗?

ws51t4hk

ws51t4hk1#

首先,您应该将远程hadoop集群配置文件(即core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、ssl client.xml)作为资源添加到配置对象中。然后按照上面链接中的步骤查看如何将作业jar手动添加到远程集群上的类路径。

相关问题