雅典娜-创建字段名与parquet列名不同的外部表

hrirmatl 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(348)

我想用雅典娜创建一个外部表。正在读取的数据被格式化为parquet，我的外部表脚本是：

CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
    a string,  
    b string,  
    y string  
) PARTITIONED BY (
    year bigint,
    month bigint,
    day bigint 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
    'serialization.format' = '1'
) LOCATION 's3://my_path/to/parquet'
TBLPROPERTIES ('has_encrypted_data'='false');

但是，我的Parquet地板列名是 a, b, x . 我如何绘制这个领域的Map x 拥有 y 作为我的外部表上的名称？

Hive parquet external-tables amazon-athena

来源：https://stackoverflow.com/questions/47519458/athena-creating-external-table-with-field-name-different-from-parquet-column-n

1条答案

按热度按时间

sbtkgmzw1#

实际上，这是可能的，但也有一些缺点。
在雅典娜，默认情况下，Parquet地板中的表是按名称读取的。这使您可以灵活地对表中的列重新排序或在表的中间添加新列。
如果可以不使用它，可以通过指定

WITH SERDEPROPERTIES ('parquet.column.index.access'='true')

在你的情况下，这看起来像

CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
    a string,  
    b string,  
    y string  
) PARTITIONED BY (
    year bigint,
    month bigint,
    day bigint 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
    'parquet.column.index.access'='true'
    'serialization.format' = '1'
) LOCATION 's3://my_path/to/parquet'
TBLPROPERTIES ('has_encrypted_data'='false');

请注意，这要求分区列的顺序与在ddl语句中编写它们的顺序相同。
您可以在aws文档回购上阅读更多关于此问题的信息

赞(0）回复(0）举报 2021-06-26

我来回答

雅典娜-创建字段名与parquet列名不同的外部表

1条答案

相关问题

热门标签

最新问答