使用jsonserde时未找到启动令牌错误

rqenqsqc  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(328)

我试图从s3导入一个json数据,在进行一些查询之后,再次将输出作为json格式导出到s3。但是,我在emr集群的配置单元步骤中得到“org.apache.hadoop.hive.serde2.serdeexception:java.io.ioexception:start token not found where expected”错误。为了理解问题所在,我简化了配置单元脚本和json数据,但它总是给出相同的错误。我怎样才能解决这个问题?
群集配置:
版本:emr-5.3.1
配置单元版本:2.1.1
hadoop发行版:amazon 2.7.3
服务角色:emr\u defaultrole
主示例类型:m4.large
简化json数据的内容:

[{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]

配置单元脚本:

DROP TABLE IF EXISTS SOURCE;
DROP TABLE IF EXISTS DESTINATION;

CREATE EXTERNAL TABLE SOURCE(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://myPath/subPath/';

CREATE EXTERNAL TABLE DESTINATION(MyID STRING, MyField STRING)                                    
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://anotherPath/subPath/';

INSERT OVERWRITE TABLE DESTINATION SELECT MyID, MyField FROM SOURCE;

下面是堆栈跟踪:
vertex失败,vertexname=map 4,vertexid=vertex\u 1278452616863\u 0001\u 1\u 00,diagnostics=[任务失败,taskid=task\u 1278452616863,diagnostics=[任务尝试0失败,info=[错误:运行任务时出错(失败):尝试\u 1278452616863:java.lang.runtimeexception:java.lang.runtimeexception:org.apache.hadoop.hive.ql.metadata.hiveexception:hive运行时错误处理可写[{“myid”:“foo123”,“myfield”:“foo”},{“myid”:“bar123”,“myfield”:“bar”}]位于org.apache.hadoop.hive.ql.exec.tez.tezprocessor.initializeandrunprocessor(tezprocessor)。java:211)在org.apache.hadoop.hive.ql.exec.tez.tezprocessor.run(tezprocessor。java:168)位于org.apache.tez.runtime.logicalioprocessorruntimetask.run(logicalioprocessorruntimetask)。java:370)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:73)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:61)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:422)在org.apache.hadoop.security.usergroupinformation.doas(用户组信息。java:1698)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:61)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:37)在org.apache.tez.common.callablewithndc.call(callablewithndc。java:36)在java.util.concurrent.futuretask.run(futuretask。java:266)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1142)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:617)在java.lang.thread.run(线程。java:745)原因:java.lang.runtimeexception:org.apache.hadoop.hive.ql.metadata.hiveexception:处理可写[{“myid”:“foo123”,“myfield”:“foo”},{“myid”:“bar123”时发生配置单元运行时错误,“myfield”:“bar”}]位于org.apache.hadoop.hive.ql.exec.tez.maprecordsource.processrow(maprecordsource)。java:95)在org.apache.hadoop.hive.ql.exec.tez.maprecordsource.pushrecord(maprecordsource。java:70)在org.apache.hadoop.hive.ql.exec.tez.maprecordprocessor.run(maprecordprocessor。java:383)在org.apache.hadoop.hive.ql.exec.tez.tezprocessor.initializeandrunprocessor(tezprocessor)。java:185) ... 14更多原因:org.apache.hadoop.hive.ql.metadata.hiveexception:处理可写[{“myid”:“foo123”,“myfield”:“foo”},{“myid”:“bar123”时发生配置单元运行时错误,“myfield”:“bar”}]位于org.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator。java:497)在org.apache.hadoop.hive.ql.exec.tez.maprecordsource.processrow(maprecordsource。java:86) ... 17其他原因:org.apache.hadoop.hive.serde2.serdeexception:java.io.ioexception:在org.apache.hive.hcatalog.data.jsonserde.deserialize(jsonserde)的预期位置未找到启动令牌。java:183)在org.apache.hadoop.hive.ql.exec.mapoperator$mapopctx.readrow(mapoperator。java:128)在org.apache.hadoop.hive.ql.exec.mapopOperator$mapopctx.access$200(mapopOperator)。java:92)在org.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator。java:488) ... 18其他原因:java.io.ioexception:在org.apache.hive.hcatalog.data.jsonserde.deserialize(jsonserde)的预期位置未找到启动令牌。java:169) ... 21个以上
谢谢。

yebdmbv4

yebdmbv41#

json应该以 { 而不是数组( [ )

k5hmc34c

k5hmc34c2#

我尝试用这种方法更新json文件的结构

{"MyID":"FOO123","MyField":"FOO"},
{"MyID":"BAR123","MyField":"BAR"}

但完成后,我注意到只有第一个对象被插入到表中。

相关问题