我有下面的orc表定义,我想复制到parquet(我没有显示更多字段):
CREATE EXTERNAL TABLE `test_a`(
`some_id` int,
`sha_sum` string,
`parent_sha_sum` string,
`md5_sum` string
)
PARTITIONED BY (
`server_date` date
)
CLUSTERED BY (
sha_sum
)
SORTED BY (
sha_sum, parent_sha_sum, md5_sum
)
INTO 256 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://cluster/user/myuser/test_a'
TBLPROPERTIES (
'orc.compress'='ZLIB',
'orc.create.index'='true',
'orc.stripe.size'='130023424',
'orc.row.index.stride'='64000',
'orc.create.index'='true';
我在想我怎样才能把这个复制到Parquet地板上。我想使用zlib或类似的压缩,我想有索引和潜在的调整Parquet地板的tblproperty的一些。
CREATE EXTERNAL TABLE `test_b`(
`some_id` int,
`sha_sum` string,
`parent_sha_sum` string,
`md5_sum` string
)
PARTITIONED BY (
`server_date` date
)
CLUSTERED BY (
sha_sum
)
SORTED BY (
sha_sum, parent_sha_sum, md5_sum
)
INTO 256 BUCKETS
STORED AS PARQUET
LOCATION 'hdfs://cluster/user/myuser/test_b'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true'
)
是否有一个清单,所有的选项可供Parquet通过TBLProperty?
暂无答案!
目前还没有任何答案,快来回答吧!