配置单元sql聚合将多个sql合并为一个sql

izkcnapc 于 2021-06-25 发布在 Hive

关注(0)|答案(1)|浏览(271)

我有一个串行的sql，比如：

select count(distinct userId) from table where hour >= 0 and hour <= 0;
select count(distinct userId) from table where hour >= 0 and hour <= 1;
select count(distinct userId) from table where hour >= 0 and hour <= 2;
...
select count(distinct userId) from table where hour >= 0 and hour <= 14;

有没有办法将它们合并到一个sql中？

Hive hiveql

来源：https://stackoverflow.com/questions/58744194/hive-sql-aggregate-merge-multiple-sqls-into-one

1条答案

按热度按时间

cigdeys31#

看起来你在试图保持一个以小时为单位的累积计数。为此，可以使用窗口函数，如下所示：

SELECT DISTINCT
  A.hour AS hour,
  SUM(COALESCE(M.include, 0)) OVER (ORDER BY A.hour) AS cumulative_count
FROM ( -- get all records, with 0 for include
  SELECT
    name,
    hour,
    0 AS include
  FROM
    table
  ) A
  LEFT JOIN
  ( -- get the record with lowest `hour` for each `name`, and 1 for include
    SELECT
      name,
      MIN(hour) AS hour,
      1 AS include
    FROM 
      table
    GROUP BY
      name
  ) M
  ON  M.name = A.name
  AND M.hour = A.hour
;

可能有一种更简单的方法，但这通常会得到正确的答案。

说明：

这对相同的输入使用2个子查询 table ，具有名为 include 跟踪哪些记录应贡献给每个存储桶的最终总数。第一个子查询简单地获取表中的所有记录并赋值 0 AS include . 第二个子查询找到所有唯一的 name s和最低 hour 在哪个槽里 name 出现，并将其指定给 1 AS include . 这两个子查询是 LEFT JOIN '由封闭查询分隔。
最外层的查询执行 COALESCE(M.include, 0) 填写任何 NULL 是由 LEFT JOIN ，还有那些 1 的和 0 是的 SUM “一个又一个 hour . 这需要一个 SELECT DISTINCT 而不是使用 GROUP BY 因为 GROUP BY 两者都要 hour 以及 include 列出，但它最终会折叠给定 hour 分组成一行（仍使用 include=1 ). 这个 DISTINCT 在 SUM 因此它将删除重复项而不丢弃任何输入行。

赞(0）回复(0）举报 2021-06-26

我来回答

配置单元sql聚合将多个sql合并为一个sql

1条答案

说明：

相关问题

热门标签

最新问答