在bigquery sql中查找最新日期的数据总和

6uxekuva  于 2021-07-29  发布在  Java
关注(0)|答案(3)|浏览(443)

我有一张这样的table:

ID   | Date        | Language
    ---------------------------------
    A    | 2013-04-10  | EN
    A    | 2013-04-11  | EN
    A    | 2013-05-12  | SN
    B    | 2013-04-01  | SN
    B    | 2013-05-28  | SN
    .... (and many more dates for other ID)

我希望查询为每个id选取最新的日期,为该代码选取语言,依此类推所有记录,并从所有backlog数据到最新日期求和。所以对于上面的数据,结果应该是1+1=2(对于en)和1(对于sn语言),对于id=a,其他id也是如此。我在s/o上发现了几乎相同的问题,但在linq查询(这里)中,不确定如何在标准sql中执行。
这就是我一直在尝试的( up only to sum all data without putting WHERE clause for latest date ):

SELECT 
        ID, 
        Date,
        SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS Sum_EN, #count all language from latest date of each ID
        SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS Sum_SN,
    FROM t 
    #WHERE Date from latest date to all backlog data 
    GROUP BY ID, Date

样本输出:

ID   |   Date      | Sum_EN | Sum_SN
    --------------------------------------
    A    | 2013-05-12  | 2      |   1
    B    | 2013-05-28  | 0      |   2
    .... (and many more dates for other ID)

注意:我在查询中注解where子句,因为不确定如何为每个id选择最新日期

9jyewag0

9jyewag01#

如果你想为每个id的最大日期,那么你可以添加最大日期,它会给你的预期输出。

SELECT 
        ID, 
        max(Date) Date,
        SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS Sum_EN, #count all language from latest date of each ID
        SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS Sum_SN,
    FROM t 
    #WHERE Date from latest date to all backlog data 
    GROUP BY ID
q35jwt9p

q35jwt9p2#

下面是bigquery标准sql

EXECUTE IMMEDIATE '''
SELECT id, MAX(Date) as Date, ''' || (
  SELECT STRING_AGG("COUNTIF(Language = '" || Language || "') AS Sum_" || Language ORDER BY Language)
  FROM (SELECT DISTINCT Language FROM `project.dataset.table`)
) || '''
FROM `project.dataset.table`
GROUP BY id
''';

如果要应用到问题输出的样本数据

Row id  Date        Sum_EN  Sum_SN   
1   A   2013-05-12  2       1    
2   B   2013-05-28  0       2
68bkxrlz

68bkxrlz3#

如果我根据结果集正确理解,那么您需要:

SELECT ID, MAX(Date) as date,
       COUNTIF(Language = 'EN') AS Sum_EN, 
       COUNTIF(Language = 'SN') AS Sum_SN,
FROM t 
GROUP BY ID;

不过,你的描述表明:

SELECT ID, MAX(Date) as date,
       COUNTIF(Language = 'EN') AS Sum_EN, 
       COUNTIF(Language = 'SN') AS Sum_SN,
FROM (SELECT t.*,
             DENSE_RANK() OVER (PARTITION BY ID ORDER BY DATE DESC) as seqnum
      FROM t
     ) t
WHERE seqnum = 1
GROUP BY ID;

但是,这只会获取每个项目的最后一个日期的数据 id ,所以我想你想要第一个版本。

相关问题