使用presto从配置单元外部表查询:无效的utf-8起始字节

dced5bon  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(481)

我刚刚安装了presto,使用presto cli查询配置单元数据时,出现以下错误:

~$ presto --catalog hive --schema default
presto:default> select count(*) from test3;

Query 20171213_035723_00007_3ktan, FAILED, 1 node
Splits: 131 total, 14 done (10.69%)
0:18 [1.04M rows, 448MB] [59.5K rows/s, 25.5MB/s]

Query 20171213_035723_00007_3ktan failed: com.facebook.presto.hive.$internal.org.codehaus.jackson.JsonParseException: Invalid UTF-8 start byte 0xa5
 at [Source: java.io.ByteArrayInputStream@6eb5bdfd; line: 1, column: 376]

只有在使用聚合函数(如count、sum等)时才会发生此错误。但是,在hive cli上使用相同的查询时,它也可以工作(但将查询转换为map reduce作业需要大量时间)。

$ hive
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties
hive> select count(*) from test3;
...
MapReduce Total cumulative CPU time: 17 minutes 56 seconds 600 msec
Ended Job = job_1511341039258_0024
MapReduce Jobs Launched:
Stage-Stage-1: Map: 87  Reduce: 1   Cumulative CPU: 1076.6 sec   HDFS Read: 23364693216 HDFS Write: 9 SUCCESS
Total MapReduce CPU Time Spent: 17 minutes 56 seconds 600 msec
OK
51751422
Time taken: 269.143 seconds, Fetched: 1 row(s)

关键是相同的查询在hive上有效,但在presto上无效,我不知道为什么。我怀疑这是因为hive和presto上使用的2json库是不同的,但我不确定。我使用以下查询在配置单元上创建了外部表:

hive> create external table test2 (app string, contactRefId string, createdAt struct <`date`: string, timezone: string, timezone_type: bigint>, eventName string, eventTime bigint, shopId bigint) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION '/data/data-new/2017/11/29/';

有人能帮我吗?

kmb7vmvb

kmb7vmvb1#

在此张贴此信息以供参考:
op记录解决方案的地方:
我使用以下命令成功地解决了问题:https://github.com/electrum/hive-serde (添加到presto at/usr/lib/presto/plugin/hive-hadoop2/和hive cluster at/usr/lib/hive hcatalog/share/hcatalog/)

相关问题