在以orc“snappy”格式连接两个表时发生配置单元错误“not a sequencefile”

mgdq6dx1 于 2021-06-29 发布在 Hive

关注(0)|答案(0)|浏览(409)

我在执行外部连接时遇到了一个“非sequencefile错误”。它以前是在相同的设置下工作的&类似的表，但是现在我不知道发生了什么变化，所以在将相当大的表连接到一个较大的键空间时出现了这个错误。
我正在运行hive 0.13.1 cloudera 5.3.0，带有Yarn。两个表都存储为orc tblproperty（“orc.compress”=“snappy”）。
存储信息：

SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:  No

此任务的诊断消息：

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
hdfs://my_cluster:9000/user/hive/warehouse/my_table/000000_0 not a
SequenceFile at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1642) at
org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: hdfs://my_cluster:9000/user/hive/warehouse/my_table
/000000_0 not a SequenceFile at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first
(RowContainer.java:237) at org.apache.hadoop.hive.
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 35  Reduce: 1   Cumulative CPU: 2742.67 sec   HDFS
Read: 8762733372 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 45 minutes 42 seconds 670 msec

在我的世界里

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.created.files=150000;
set hive.error.on.empty.partition=true;
set hive.cli.print.header=true;
set hive.optimize.s3.query=true;
set hive.auto.convert.join=true;
set mapred.child.java.opts=-Xmx2048m;
set hive.error.on.empty.partition=false;
set hive.hadoop.supports.splittable.combineinputformat=true;
set hive.enforce.bucketing=true;
set hive.optimize.bucketmapjoin=true;
set hive.mapjoin.smalltable.filesize=50000000;
set hive.resultset.use.unique.column.names=false;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;

我尝试将这两个表都声明为sequencefile，但在全尺寸的表上出现了不同的错误，但在一个小示例上却没有：IndexAutofBound。
metastore是mysql。
hive/hadoop设置的完整列表很长，但我会查找它-只是不知道要查找什么。
如果这与io或损坏的hdfs有关，如何检查hdfs的运行状况？

mysql Hive hdfs yarn metastore

来源：https://stackoverflow.com/questions/37329821/hive-error-not-a-sequencefile-occuring-on-joining-two-tables-in-orc-snappy-f

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

在以orc“snappy”格式连接两个表时发生配置单元错误“not a sequencefile”

暂无答案！

相关问题

热门标签

最新问答