将表设置从orc复制到parquet

j7dteeu8  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(234)

我有下面的orc表定义,我想复制到parquet(我没有显示更多字段):

CREATE EXTERNAL TABLE `test_a`(
  `some_id` int,
  `sha_sum` string,
  `parent_sha_sum` string,
  `md5_sum` string
)
PARTITIONED BY (
  `server_date` date
)
CLUSTERED BY (
  sha_sum
)
SORTED BY (
  sha_sum, parent_sha_sum, md5_sum
)
INTO 256 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://cluster/user/myuser/test_a'
TBLPROPERTIES (
  'orc.compress'='ZLIB',
  'orc.create.index'='true',
  'orc.stripe.size'='130023424',
  'orc.row.index.stride'='64000',
  'orc.create.index'='true';

我在想我怎样才能把这个复制到Parquet地板上。我想使用zlib或类似的压缩,我想有索引和潜在的调整Parquet地板的tblproperty的一些。

CREATE EXTERNAL TABLE `test_b`(
  `some_id` int,
  `sha_sum` string,
  `parent_sha_sum` string,
  `md5_sum` string
)
PARTITIONED BY (
  `server_date` date
)
CLUSTERED BY (
  sha_sum
)
SORTED BY (
  sha_sum, parent_sha_sum, md5_sum
)
INTO 256 BUCKETS
STORED AS PARQUET
LOCATION 'hdfs://cluster/user/myuser/test_b'
TBLPROPERTIES (
 'COLUMN_STATS_ACCURATE'='true'
)

是否有一个清单,所有的选项可供Parquet通过TBLProperty?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题