hadoop jobs.setjar不适用于hdfs上的jar

k97glaaz  于 2021-06-03  发布在  Hadoop
关注(0)|答案(0)|浏览(272)

我正在尝试解决hadoop应用程序抛出java.lang.classnotfoundexception时出现的问题:

WARN mapreduce.FaunusCompiler: Using the distribution Faunus job jar: ../lib/faunus-0.4.4-hadoop2-job.jar
INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: VerticesMap.Map > CountMapReduce.Map > CountMapReduce.Reduce
INFO mapreduce.FaunusCompiler: Job data location: output/job-0
INFO client.RMProxy: Connecting to ResourceManager at yuriys-bigdata3/172.31.8.161:8032
WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402963354379_0016
INFO impl.YarnClientImpl: Submitted application application_1402963354379_0016
INFO mapreduce.Job: The url to track the job: http://local-bigdata3:8088/proxy/application_1402963354379_0016/
INFO mapreduce.Job: Running job: job_1402963354379_0016
INFO mapreduce.Job: Job job_1402963354379_0016 running in uber mode : false
INFO mapreduce.Job:  map 0% reduce 0%
INFO mapreduce.Job: Task Id : attempt_1402963354379_0016_m_000000_0, Status : FAILED     

 Error: java.lang.ClassNotFoundException:
 com.tinkerpop.blueprints.util.DefaultVertexQuery
         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
         at java.lang.ClassLoader.defineClass1(Native Method)
         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
         at com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)
         at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
         at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:415)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

应用程序确实创建了一个“fat”jar文件,其中所有依赖jar(包括包含not found类的依赖jar)都包含在lib节点下。应用程序确实在这个fat jar文件上设置了job.setjar。
代码没有做任何奇怪的事情:

job.setJar(hadoopFileJar);
         ...
        boolean success = job.waitForCompletion(true);

此外,我在yarn-site.xml中查找了配置,并验证了yarn.nodemanager.local-dirs下的job dir是否包含该jar(尽管已重命名为job.jar),以及其中包含提取jar的lib目录。
i、 e.包含丢失类的jar就在那里。yarn/mr在每个作业调度之后用所有这些必需的文件重新创建这个dir,因此文件确实会传输到那里。
到目前为止,我发现执行失败代码的java工作进程上的classpath环境变量设置为
c:\hdp\data\hadoop\local\usercache\user\appcache\application\u 1402963354379\u 0013\container\u 1402963354379\u 0013\u 02\u000001\classpath-3824944728798396318.jar
这个jar只包含一个manifest.mf,manifest包含指向带有“fat.jar”文件的目录及其目录的路径(原始格式已保存):

file:/c:/hdp/data/hadoop/loc al/usercache/user/appcache/application_1402963354379_0013/container
_1402963354379_0013_02_000001/job.jar/job.jar file:/c:/hdp/data/hadoo p/local/usercache/user/appcache/application_1402963354379_0013/cont ainer_1402963354379_0013_02_000001/job.jar/classes/ file:/c:/hdp/data /hadoop/local/usercache/user/appcache/application_1402963354379_001 3/container_1402963354379_0013_02_000001/jobSubmitDir/job.splitmetain fo file:/c:/hdp/data/hadoop/local/usercache/user/appcache/applicati on_1402963354379_0013/container_1402963354379_0013_02_000001/jobSubmi tDir/job.split file:/c:/hdp/data/hadoop/local/usercache/user/appcac he/application_1402963354379_0013/container_1402963354379_0013_02_000 001/job.xml file:/c:/hdp/data/hadoop/local/usercache/user/appcache/ application_1402963354379_0013/container_1402963354379_0013_02_000001 /job.jar/

但是,此路径不会显式地将jar添加到目录中,即上述清单中的目录
文件:/c:/hdp/data/hadoop/local/usercache/user/appcache/application\u 1402963354379\u 0013/container\u 1402963354379\u 0013\u 02\u000001/job.jar/
包含jar文件,其中的类不是由yarn找到的(因为这个目录包含“fat”jar lib部分中的所有jar),但是对于java世界,这种类路径的设置似乎不正确——这个目录应该包含在star中,
例如:
文件:/c:/hdp/data/hadoop/local/usercache/user/appcache/application\u 1402963354379\u 0013/container\u 1402963354379\u 0013\u 02\u000001/job.jar/

我把依赖传递给Yarn有什么不对?
集群配置可能是一个问题,或者这可能是hadoop发行版(hdp2.1,windowsx64)上的一个bug?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题