未找到拆分类org.apache.hadoop.hive.ql.io.orc.orcsplit

nkoocmlb  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(328)

我正在尝试使用orc作为hadoop流的输入格式
下面是我如何运行它

export HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -file /home/mr/mapper.py -mapper /home/mr/mapper.py \
    -file /home/mr/reducer.py -reducer /home/mr/reducer.py \
    -input /user/cloudera/input/users/orc \
    -output /user/cloudera/output/simple \
    -inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat \

但我得到了一个错误:
错误:java.io.ioexception:在org.apache.hadoop.mapred.maptask.getsplitdetails(maptask)中找不到分割类org.apache.hadoop.hive.ql.io.orc.orcsplit。java:363)在org.apache.hadoop.mapred.maptask.runoldmapper(maptask。java:426)在org.apache.hadoop.mapred.maptask.run(maptask。java:343)在org.apache.hadoop.mapred.yarnchild$2.run(yarnchild。java:163)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:415)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1671)在org.apache.hadoop.mapred.yarnchild.main(yarnchild。java:158)原因:java.lang.classnotfoundexception:在org.apache.hadoop.conf.configuration.getclassbyname(配置)中找不到类org.apache.hadoop.hive.ql.io.orc.orcsplit。java:2018)在org.apache.hadoop.mapred.maptask.getsplitdetails(maptask。java:361) ... 7个以上
看起来orcspilt类应该在hive-exec.jar中

ac1kyiln

ac1kyiln1#

一个更简单的解决方案是让hadoop流媒体通过使用 -libjars 争论。此参数采用逗号分隔的jars列表。举个例子,你可以:

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -libjars /opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar
    -file /home/mr/mapper.py -mapper /home/mr/mapper.py \
    -file /home/mr/reducer.py -reducer /home/mr/reducer.py \
    -input /user/cloudera/input/users/orc \
    -output /user/cloudera/output/simple \
    -inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
xpcnnkqh

xpcnnkqh2#

我找到了答案。我的问题是我只在一个节点上设置了hadoop\u classpath var。所以我要么在everynode上设置它,要么使用分布式缓存

相关问题