bigquery聚集每日的backlog值

disho6za  于 2021-07-29  发布在  Java
关注(0)|答案(2)|浏览(334)

我有一张这样的table:

ID      | Date                        | Language
    ----------------------------------------------
    A    | 2020-06-09 07:00:00.342 UTC | EN
    A    | 2020-06-09 17:15:00.342 UTC | EN
    A    | 2020-07-16 23:11:37.342 UTC | EN
    A    | 2020-07-16 17:11:37.342 UTC | SN

    B    | 2020-06-09 17:11:37.342 UTC | SN
    B    | 2020-06-09 17:11:37.342 UTC | EN
    B    | 2020-07-16 17:11:37.342 UTC | SN
    B    | 2020-07-16 17:11:37.342 UTC | EN
    .... (and many more dates for other ID)

我的查询目标是每天的当前数据应该是每个id的历史数据的总和。到目前为止,我的查询能够对每天的语言总数进行总和,但我不确定如何编辑查询,以便它也可以添加积压的数据,而不是仅针对特定日期的数据。我试着跟帖(因为我的问题和这个问题很相似,只是这个问题要求只取最新的id,但我考虑所有id),但是很困惑,因为我当前的查询有子查询。
到目前为止,我的工作(只计算了每天的价值,没有累计):

WITH
    table1 AS (
    SELECT  
     ID ,Date, 
     SUM(CASE WHEN time >= '06:00:00' AND time <= '11:59:00' THEN 1 ELSE 0 END) AS morning,
     SUM(CASE WHEN time >= '00:00:00' AND time <= '05:59:00' THEN 1 ELSE 0 END) AS night,
     FROM (
         SELECT   
          ID , TIME_TRUNC(TIME(Timestamp), SECOND) AS time,  DATE(Timestamp) as Date, Language
         FROM 
            t
         GROUP BY ID, Language,  DATE(Timestamp) 
    )
GROUP BY  ID, Date
    ),

    table2 AS (
    SELECT  
      ID, 
      SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS Sum_EN,
      SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS Sum_SN
    FROM (
        SELECT DATE(Timestamp) as Date,   ID, 
          CASE 
          WHEN Preferred_ Language in ('EN', 'English') THEN 'EN'  ELSE Language 
          END AS Language,
        FROM 
          t 
        GROUP BY ID, Language, 
    )
GROUP BY    ID,   Date

    ))
    SELECT   *
    FROM   table1 
    FULL OUTER JOIN table2 USING (ID)

样本输出:

ID    | Date       | Sum_EN | Sum_SN 
    ------------------------------------
    A    | 2020-06-09 |   2    |   0
    A    | 2020-07-16 |   3    |   1

    B    | 2020-06-09 |   1    |   1
    B    | 2020-07-16 |   2    |   2
    .... (and many more dates for other ID)

聚合逻辑:
对于id a,在2020-06-09,英语的总数为2(注意,由于时间不同,一天中的id大于1),同时在2020-07-16,英语的总数为3,因为它也采用了前几天的值。
相同的逻辑概念适用于所有日期和ids,因此新日期的数据值基本上是所有积压的数据值的总和。

q9rjltbz

q9rjltbz1#

使用以下方法解决:

SUM(rating_total) OVER(
    PARTITION BY ID ORDER BY UNIX_DATE(Date) 
  ) n_rating,
sf6xfgos

sf6xfgos2#

您需要的查询更像这样:

select id, date(date), 
       countif(lang = 'EN') as en,
       countif(lang = 'SN') as sn,
       sum(countif(lang = 'EN')) over (partition by id order by min(date)) as en_running,
       sum(countif(lang = 'SN')) over (partition by id order by min(date)) as sn_running
from t
group by id, date(date);

我不确定你的查询中还有哪些条件。你不能在问题中描述他们。

相关问题