hive 如何在AWS Athena和Apache Iceberg表中使用DBT

kuuvgm7e 于 2023-05-28 发布在 Hive

关注(0)|答案(2)|浏览(313)

我们有DBT模型，我们使用它在AWS Athena表上运行。它在后台创建Hive外部表。现在我们遇到了一种情况，列的数据类型将来可能会更改。基于Hive的Athena表不允许更改列的数据类型，但Apache冰山表允许。我们可以在Apache冰山表中更改列的数据类型。
我们将数据从旧的Hive表复制到冰山表，但当我们运行DBT模型时，它出现以下错误：

[error] [MainThread]: An error occurred (InvalidInputException) when calling the GetPartitions operation:

该型号的DBT配置如下。它曾经与Hive外部表一起工作，但不与Apache Iceberg表一起工作。请就此提出建议。

{{
    config(materialized='incremental',
           external_location="s3://" + env_var('BUCKET-NAME') + "/" + env_var('SCHEMA-NAME') + "/" + this.identifier,
           partitioned_by = ['event_date'],
           incremental_strategy='insert_overwrite',
           on_schema_change='ignore'
    )
}}

Apache冰山的创建如下：

CREATE TABLE iceberg_table (
  id int,
  data string,
  event_date string) 
PARTITIONED BY (event_date) 
LOCATION 's3://DOC-EXAMPLE-BUCKET/iceberg-folder' 
TBLPROPERTIES (
  'table_type'='ICEBERG',
  'format'='parquet',
  'write_target_data_file_size_bytes'='536870912',
  'optimize_rewrite_delete_file_threshold'='10'
)

Hive

来源：https://stackoverflow.com/questions/76064879/how-to-use-dbt-with-aws-athena-with-apache-iceberg-tables

2条答案

按热度按时间

s4n0splo1#

我发现目前DBT不支持Apache Iceberg。我们应该使用Apache Hudi。

赞(0）回复(0）举报 2023-05-28

omtl5h9j2#

Athena的DBT支持Apache Iceberg。您的配置代码块看起来像这样：

{{
  config(
    schema = env_var('DATABASE'),
    s3_data_dir='s3://' ~ env_var('BUCKET') ~ '/',
    s3_data_naming='table_unique',
    format='parquet',
    write_compression='GZIP',
    materialized='incremental',
    table_type='iceberg',
    incremental_strategy = 'merge',
    unique_key = ['key1', 'key2], 
    tags=["insert_tags"]
  )
}}

然后运行您的DBT模型应该会创建冰山表。在这种情况下，它将遵循upsert策略，根据unique_key字段更新更改的值或插入新的值。

赞(0）回复(0）举报 2023-05-28

我来回答

hive 如何在AWS Athena和Apache Iceberg表中使用DBT

2条答案

相关问题

热门标签

最新问答