bigquery-summing

8e2ybdfx  于 2021-07-24  发布在  Java
关注(0)|答案(1)|浏览(281)

我有一张这样的table:

word   nb_by_date.date    nb_by_date.nb
---------------------------------------
abc    2020-01-01         17
       2020-01-06         43
abc    2020-01-01         33
       2020-01-05         12
       2020-01-06         5
def    2020-01-02         11
       2020-01-05         8
def    2020-01-02         1

您可以使用以下方法获取此示例:

WITH t AS (
SELECT "abc" AS word, [STRUCT('2020-01-01' AS date, 17 AS nb), STRUCT('2020-01-06' AS date, 43 AS nb)]
UNION ALL SELECT "abc" AS word, [STRUCT('2020-01-01' AS date, 33 AS nb), STRUCT('2020-01-05' AS date, 12 AS nb), STRUCT('2020-01-06' AS date, 5 AS nb)]
UNION ALL SELECT "def" AS word, [STRUCT('2020-01-02' AS date, 11 AS nb), STRUCT('2020-01-05' AS date, 8 AS nb)]
UNION ALL SELECT "def" AS word, [STRUCT('2020-01-02' AS date, 1 AS nb)]
)

我的目标是:

word   nb_by_date.date    nb_by_date.nb
---------------------------------------
abc    2020-01-01         50
       2020-01-05         12
       2020-01-06         55
def    2020-01-02         22
       2020-01-05         8

我的尝试是:

SELECT
  word,
  ARRAY(
  SELECT STRUCT(date, SUM(nb))
  FROM UNNEST(nb_by_date)
  GROUP BY date
  ORDER BY date) nb_by_date
FROM (
  SELECT word, ARRAY_CONCAT_AGG(nb_by_date) nb_by_date
  FROM t
  GROUP BY word
)

它适用于这个玩具的例子。但是,我有大量的数据和使用 ARRAY_CONCAT_AGG(nb_by_date) 创建超过100mb限制的行(无法查询大于100mb限制的行)。如何调整查询以使其即使在有大量数据的情况下也能工作?

ygya80vv

ygya80vv1#

可以使用两个聚合级别:

WITH t AS (
      SELECT 'abc' AS word, [STRUCT('2020-01-01' AS date, 17 AS nb), STRUCT('2020-01-06' AS date, 43 AS nb)] as ar UNION ALL
      SELECT 'abc' AS word, [STRUCT('2020-01-01' AS date, 33 AS nb), STRUCT('2020-01-05' AS date, 12 AS nb), STRUCT('2020-01-06' AS date, 5 AS nb)] UNION ALL
      SELECT 'def' AS word, [STRUCT('2020-01-02' AS date, 11 AS nb), STRUCT('2020-01-05' AS date, 8 AS nb)] UNION ALL
      SELECT 'def' AS word, [STRUCT('2020-01-02' AS date, 1 AS nb)]
     )
select t.word, array_agg(struct( date, nb) order by date) as ar
from (select t.word, el.date, sum(el.nb) as nb
      from t cross join
           unnest(t.ar) el
      group by t.word, el.date
     ) t
group by word

相关问题