org.apache.hadoop.hive.ql.metadata.hiveexception:java.lang.unsupportedoperationexception:parquet不支持时间戳。见hive-6384;
在azure databricks中执行以下代码时出现上述错误。
spark_session.sql("""
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time TIMESTAMP
)
PARTITIONED BY (
Date DATE)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION "/mnt/data_analysis/pre-processed/"
""")
1条答案
按热度按时间kyxcudwk1#
根据hive-6384 jira,从hive-1.2开始,您可以使用
Timestamp,date
Parquet桌上的类型。配置单元<1.2版本的解决方法:
1.使用字符串类型:
然后在处理过程中你可以
arrival_time
,Date
投给timestamp
,date
类型。使用
view
把柱子扔了但是views are slow.
2. Using ORC format:
```CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time Timestamp
)
PARTITIONED BY (
Date date)
Stored as orc
Location '/mnt/data_analysis/pre-processed/';