大查询查找可能位于多个列中的数据

3wabscal  于 2021-07-29  发布在  Java
关注(0)|答案(3)|浏览(322)

我有一张表,上面有下列数据

id|task1_name|task1_date|task2_name|task2_date
1,breakfast,1/1/20,,
2,null,null,breakfast,,1/1/20
3,null,null,lunch,,1/1/20
4,dinner,1/1/20,lunch,1/1/10

我想构建一个视图,它总是在同一列中显示任务名称,如果在任何列中都找不到任务名称,则为null。

id|dinner_date|lunch_date|breakfast_date
1,1/1/20, null, null
2,null, null, 1/1/20
2,1/1/20, 1/1/10, null

我试过使用嵌套的if语句。

SELECT *
IF(task_1_name = 'dinner', task1_date, IF(task2_date = 'dinner', task2_date, NULL)) as `dinner_date`
FROM t

但由于实际数据集中有50个左右的列,这似乎是一个愚蠢的解决方案,而且会很快变得复杂,有没有更聪明的方法呢?

jgwigjjp

jgwigjjp1#

一种方法使用 case 表达:

select t.*,
       (case when task1_name = 'dinner' then task1_date
             when task2_name = 'dinner' then task2_date
             when task3_name = 'dinner' then task3_date
        end) as dinner_date             
from t;
dzhpxtsq

dzhpxtsq2#

下面是bigquery标准sql和generic,足以解决问题中表达的问题。您不需要预先知道列数和任务名称(尽管它们不应该有 , 或者 : 这在这里不应该是一个很大的限制,如果需要可以解决)


# standardSQL

CREATE TEMP TABLE ttt AS 
SELECT id, 
  SPLIT(k, '_')[OFFSET(0)] task, 
  MAX(IF(SPLIT(k, '_')[OFFSET(1)] = 'name', v, NULL)) AS name,
  MAX(IF(SPLIT(k, '_')[OFFSET(1)] = 'date', v, NULL)) AS DAY
FROM (
  SELECT id, 
    TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') k,
    TRIM(SPLIT(kv, ':')[OFFSET(1)], '"') v
  FROM `project.dataset.table` t,
  UNNEST(SPLIT(TRIM(TO_JSON_STRING(t), '{}'))) kv
  WHERE TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') != 'id'
  AND TRIM(SPLIT(kv, ':')[OFFSET(1)], '"') != 'null'
)
GROUP BY id, task;

EXECUTE IMMEDIATE '''
SELECT id, ''' || (
  SELECT STRING_AGG(DISTINCT "MAX(IF(name = '" || name || "', day, NULL)) AS " || name || "_date")
  FROM ttt
) || '''  
FROM ttt
GROUP BY 1
ORDER BY 1
'''

注解;这里的假设只是关于列名称 task<N>_name 以及 task<N>_date 是否适用于与您的相关样本数据(类似)

WITH `project.dataset.table` AS (
  SELECT 1 id, 'breakfast' task1_name, '1/1/21' task1_date, NULL task2_name, NULL task2_date UNION ALL
  SELECT 2, NULL, NULL, 'breakfast', '1/1/22' UNION ALL
  SELECT 3, NULL, NULL, 'lunch', '1/1/23' UNION ALL
  SELECT 4, 'dinner', '1/1/24', 'lunch', '1/1/10'
)

输出为

Row id  breakfast_date  lunch_date  dinner_date  
1   1   1/1/21          null        null     
2   2   1/1/22          null        null     
3   3   null            1/1/23      null     
4   4   null            1/1/10      1/1/24
gmxoilav

gmxoilav3#

下面是另一个解决方案,它不使用动态sql,不依赖于特定的列名,并且可以处理任意数量的列:

WITH table AS (
  SELECT 1 id, 'breakfast' task1_name, '1/1/21' task1_date, NULL task2_name, NULL task2_date UNION ALL
  SELECT 2, NULL, NULL, 'breakfast', '1/1/22' UNION ALL
  SELECT 3, NULL, NULL, 'lunch', '1/1/23' UNION ALL
  SELECT 4, 'dinner', '1/1/24', 'lunch', '1/1/10'
)
SELECT 
  REGEXP_EXTRACT(f, r'breakfast\, ([^\,\)]*)'),
  REGEXP_EXTRACT(f, r'lunch\, ([^\,\)]*)'),
  REGEXP_EXTRACT(f, r'dinner\, ([^\,\)]*)')
FROM (
  SELECT FORMAT("%t", t) f FROM table t
)

相关问题