正在尝试从配置单元中hdfs中的.gz日志文件构建外部配置单元表。运行查询后:
CREATE EXTERNAL TABLE table_name(att1 STRING,att3 STRING,att4 STRING,att5 STRING) row format serde "org.openx.data.jsonserde.JsonSerDe" with serdeproperties ("ignore.malformed.json"="true") STORED AS TEXTFILE LOCATION 'hdfs:////hdfs_location/';
当我跑的时候
select count(*) from table_name;
它通过给出以下堆栈跟踪而失败:
任务尝试2失败,info=[错误:运行任务时出错(失败):尝试\u 1534417036833 \u 0016 \u 1 \u 00 \u 000054 \u 2:java.lang.runtimeexception:org.apache.hadoop.hive.ql.metadata.hiveexception:java.io.ioexception:java.io.eofeexception:org.apache.hadoop.hive.ql.exec.tez.tezprocessor.initializeandrunprocessor(tezprocessor)处的输入流意外结束。java:211)在org.apache.hadoop.hive.ql.exec.tez.tezprocessor.run(tezprocessor。java:168)位于org.apache.tez.runtime.logicalioprocessorruntimetask.run(logicalioprocessorruntimetask)。java:370)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:73)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:61)在javax.security.auth.subject.doas(主题)中的java.security.accesscontroller.doprivileged(本机方法)。java:422)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1836)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:61)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:37)在org.apache.tez.common.callablewithndc.call(callablewithndc。java:36)在java.util.concurrent.futuretask.run(futuretask。java:266)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1149)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:624)在java.lang.thread.run(线程。java:748)原因:org.apache.hadoop.hive.ql.metadata.hiveexception:java.io.ioexception:java.io.eofeException:org.apache.hadoop.hive.ql.exec.tez.maprecordsource.pushrecord(maprecordsource)处的输入流意外结束。java:74)在org.apache.hadoop.hive.ql.exec.tez.maprecordprocessor.run(maprecordprocessor。java:419)位于org.apache.hadoop.hive.ql.exec.tez.tezprocessor.initializeandrunprocessor(tezprocessor)。java:185) ... 14个以上
我尝试通过验证json来查看内容。运行ruby脚本检查内容:
require "zlib"
require "json"
path = "/home/test_directory/file.gz"
infile = open(path)
gz = Zlib::GzipReader.new(infile)
gz.each_line do |line_content|
begin
JSON.parse(line_content)
rescue JSON::ParserError => e
p "json parsing exception" + " -- " + line_content.strip
rescue Exception => ex
puts "An error of type #{ex.class} happened, message is #{ex.message}"
end
end
它为我提供了文件中某些行的异常消息:
编码类型的错误::invalidbytesequenceerror发生,消息是“\xc3”在us ascii上
尝试根据以下内容修改我的查询:https://community.hortonworks.com/articles/58548/processing-files-in-hive-using-native-non-utf8-cha.html 修改是在创建外部表后添加此行:
ALTER TABLE table_name SET SERDEPROPERTIES ('serialization.encoding'='SJIS');
它不起作用。p、 答:这不是一些在线论坛讨论的空文件问题,因为我在hdfs中没有空文件。
我能做些什么,这似乎是一些字符编码问题,但无法得到解决办法。
暂无答案!
目前还没有任何答案,快来回答吧!