最近,我们将hive数据归档到s3(数据格式为orc),但是雅典娜在查询数据时出现了一些错误。
当我执行 select * from ta_event_10 limit 1
,雅典娜查询出错,错误如下:
hive#u bad#u data:orc中字段#event#u time的类型timestamp与表架构中定义的类型varchar不兼容
另外,有些sql可以运行,但结果不正确,例如 select "#event_time" from ta_event_10 limit 10
.
Hive show create table
:
CREATE TABLE `ta.ta_event_10`(
`#user_id` bigint,
`#event_name` string,
`#event_time` timestamp,
`#app_id` string,
`#account_id` string,
`#distinct_id` string,
`#server_time` timestamp,
`#kafka_offset` bigint,
`#device_id` string,
`version_code` double,
`#province` string,
`#device_model` string,
`#city` string,
`channel` string,
`launchsource` string,
`flavor` string,
`#app_version` string,
`#os` string,
`#country_code` string,
`#ip` string,
`#os_version` string,
`#manufacturer` string,
`#country` string,
`version_name` string,
`vpn` string,
`parallelversion` string,
`segmentid` string,
`user_dimen` double,
`packagename` string,
`packageversion` string,
`installsource` string,
`packagelabel` string,
`isrunning` string,
`payment_platform` string,
`buy_status` string,
`productid` string,
`orderid` string,
`#zone_offset` double)
PARTITIONED BY (
`$part_event` string,
`$part_date` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://ta2/user/hive/warehouse/ta.db/ta_event_10'
TBLPROPERTIES (
'presto_query_id'='20200113_034929_06651_2dhiw',
'presto_version'='316-ta',
'transient_lastDdlTime'='1578887369')
雅典娜 show create table
:
CREATE EXTERNAL TABLE `ta_event_10`(
`#user_id` bigint,
`#event_name` string,
`#event_time` timestamp,
`#app_id` string,
`#account_id` string,
`#distinct_id` string,
`#server_time` timestamp,
`#kafka_offset` bigint,
`#device_id` string,
`version_code` double,
`#province` string,
`#device_model` string,
`#city` string,
`channel` string,
`launchsource` string,
`flavor` string,
`#app_version` string,
`#os` string,
`#country_code` string,
`#ip` string,
`#os_version` string,
`#manufacturer` string,
`#country` string,
`version_name` string,
`vpn` string,
`parallelversion` string,
`segmentid` string,
`user_dimen` double,
`packagename` string,
`packageversion` string,
`installsource` string,
`packagelabel` string,
`isrunning` string,
`payment_platform` string,
`buy_status` string,
`productid` string,
`orderid` string,
`#zone_offset` double)
PARTITIONED BY (
`$part_event` string,
`$part_date` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3://test-ta-export/ta_event_10'
TBLPROPERTIES (
'transient_lastDdlTime'='1583763184')
“#事件#时间”字段都是时间戳类型。
我通过元数据查找一个orc文件 orc-metadata
,它们看起来像这样:
先用Hive和雅典娜。
暂无答案!
目前还没有任何答案,快来回答吧!