union-all不会在配置单元中生成任何数据

6za6bjd0 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(592)

我尝试对具有相同ddl结构的三个不同表执行union all，但最终输出是生成零行。我不知道在执行过程中到底发生了什么。有人能分享一下你的想法吗？我的示例配置单元sql如下所示。谢谢您。

SET hive.execution.engine=tez;
    SET hive.exec.dynamic.partition.mode=nonstrict;
    SET hive.qubole.cleanup.partial.data.on.failure=true;
    SET hive.tez.container.size=8192;
    SET tez.task.resource.memory.mb=8192;
    SET tez.task.resource.cpu.vcores=2;
    SET hive.mapred.mode=nonstrict;
    SET hive.qubole.dynpart.use.prefix=true;
    SET hive.vectorized.execution.enabled=true;
    SET hive.vectorized.execution.reduce.enabled =true;
    SET hive.cbo.enable=true;
    SET hive.compute.query.using.stats=true;
    SET hive.stats.fetch.column.stats=true;
    SET hive.stats.fetch.partition.stats=true;
    SET mapred.reduce.tasks = -1;
    SET hive.auto.convert.join.noconditionaltask.size=2730;
    SET hive.auto.convert.join=true;
    SET hive.auto.convert.join.noconditionaltask=true;
    SET hive.auto.convert.join.noconditionaltask.size=405306368;
    SET hive.compute.query.using.stats=true;
    SET hive.stats.fetch.column.stats=true;
    SET hive.stats.fetch.partition.stats=true;
    SET mapreduce.job.reduce.slowstart.completedmaps=0.8;

    CREATE  TABLE IF NOT EXISTS X STORED AS PARQUET AS 
      SELECT a,
             b,
             c
        FROM A
      UNION ALL
      SELECT a,
             b,
             c
        FROM B
      UNION ALL
      SELECT a,
             b,
             c
        FROM C;

如果我尝试在presto上运行下面的查询，它会显示有数据。

SELECT COUNT(1) FROM 
(
          SELECT a,
                 b,
                 c
            FROM A
          UNION ALL
          SELECT a,
                 b,
                 c
            FROM B
          UNION ALL
          SELECT a,
                 b,
                 c
            FROM C 
)Z;

sql hadoop Hive hiveql hive-query

来源：https://stackoverflow.com/questions/54996941/union-all-doesnt-generate-any-data-in-hive

1条答案

按热度按时间

dwthyt8l1#

UNION ALL 在tez上运行时，它并行运行并在表位置中创建额外的子目录（检查表位置中的内容）。在读取表之前尝试添加这些配置设置，以允许配置单元读取子目录：

set hive.mapred.supports.subdirectories=true; 
set mapred.input.dir.recursive=true;

您的查询非常简单，只在Map器上运行每个子查询编写自己的子目录，不会干扰另一个子目录。
或者，您可以通过添加 distribute by 最后还是 order by （运行较慢）、运行union而不是union all、在union之后应用filter等-它将在没有子目录的表文件夹中创建文件：

CREATE  TABLE IF NOT EXISTS X STORED AS PARQUET AS 
select * from 
(
      SELECT a,
             b,
             c
        FROM A
      UNION ALL
      SELECT a,
             b,
             c
        FROM B
      UNION ALL
      SELECT a,
             b,
             c
        FROM C
      )s distribute by a; --this will force reducer step

赞(0）回复(0）举报 2021-05-27

我来回答

union-all不会在配置单元中生成任何数据

1条答案

相关问题

热门标签

最新问答