有没有可能将配置单元表格式转换为orc并使其成为嵌套格式

ecbunoof 于 2021-06-25 发布在 Hive

关注(0)|答案(2)|浏览(360)

我有一套Hive表，不是在兽人的格式，也没有扣。我想把它们的格式改成兽人的，也要把它们扣起来。在网上找不到具体的答案。任何回答或指导都将不胜感激。配置单元版本是2.3.5
或者是否有可能在spark（pyspark或scala）中完成？
最简单的解决方案是创建一个新表，该表采用orc格式，然后从旧表插入。寻找就地解决方案。

Hive orc acid

来源：https://stackoverflow.com/questions/58618140/is-it-possible-to-convert-a-hive-table-format-to-orc-and-make-it-bucketed

2条答案

按热度按时间

zpgglvta1#

创建bucketed表并使用insert overwrite将数据加载到其中：

CREATE TABLE table_bucketed(col1 string, col2 string)
CLUSTERED BY(col1) INTO 10 BUCKETS
STORED AS ORC;

INSERT OVERWRITE TABLE table_bucketed
select ...
  from table_not_bucketed

另请参见分类的带扣表。

赞(0）回复(0）举报 2021-06-26

ehxuflar2#

配置单元：使用暂存表读取未绑定的数据（假设 TEXTFILE 格式）使用以下命令：

CREATE TABLE staging_table(
    col1 colType, 
    col2 colType, ...
    coln colType
)
STORED AS 
    TEXTFILE
LOCATION 
    '/path/of/input/data';

CREATE TABLE target_table(
    col1 colType, 
    col2 colType, ...
    coln colType
)
CLUSTERED BY(col1) INTO 10 BUCKETS
STORED AS ORC;

INSERT OVERWRITE TABLE table_bucketed
SELECT 
    col1, col2, ..., coln
FROM 
    staging_table;

同样的方法也可以在 **Spark**DataFrame APIs （假设 CSV 格式）如下：

df = spark.read.format("csv")
          .option("inferSchema", "true")
          .option("header", "true")
          .option("delimiter", ",")
          .option("path", "/path/of/input/data/")
          .load()

df.write.format("orc")
        .option("path", "/path/of/output/data/")
        .save()

赞(0）回复(0）举报 2021-06-26

我来回答

有没有可能将配置单元表格式转换为orc并使其成为嵌套格式

2条答案

相关问题

热门标签

最新问答