impala sql将1行转置/透视为列或按范围分组的替代方法

r55awzrz  于 2021-06-26  发布在  Impala
关注(0)|答案(1)|浏览(476)
SELECT 
    SUM(CASE WHEN age >= 80 THEN 1 ELSE 0 END) AS '>=80',
    SUM(CASE WHEN age BETWEEN 70 AND 79 THEN 1 ELSE 0 END) AS '70-79',
    SUM(CASE WHEN age BETWEEN 60 AND 69 THEN 1 ELSE 0 END) AS '60-69',
    SUM(CASE WHEN age BETWEEN 50 AND 59 THEN 1 ELSE 0 END) AS '50-59',
    SUM(CASE WHEN age BETWEEN 40 AND 49 THEN 1 ELSE 0 END) AS '40-49',
    SUM(CASE WHEN age BETWEEN 30 AND 39 THEN 1 ELSE 0 END) AS '30-39',
    SUM(CASE WHEN age BETWEEN 20 AND 30 THEN 1 ELSE 0 END) AS '20-29',
    SUM(CASE WHEN age BETWEEN 10 AND 19 THEN 1 ELSE 0 END) AS '10-19',
    SUM(CASE WHEN age BETWEEN 0 AND 9 THEN 1 ELSE 0 END) AS '0-9'
FROM (SELECT * FROM table) a

我使用上面的查询来存储年龄范围,并输出:

+------+-------+-------+-------+-------+-------+--------+---------+---------+
| >=80 | 70-79 | 60-69 | 50-59 | 40-49 | 30-39 | 20-29  | 10-19   | 0-9     |
+------+-------+-------+-------+-------+-------+--------+---------+---------+
| 136  | 394   | 1273  | 2530  | 3298  | 15384 | 194099 | 2244405 | 9780789 |
+------+-------+-------+-------+-------+-------+--------+---------+---------+

我需要将其转换为一种列格式,或者找到另一种bucketing方法,允许上面的查询生成列而不是一行值。
期望输出:

+-----------+----------*
| age_range | freq     | 
+-----------+----------*
| >=80      | 136      |
+-----------+----------*
| 70-79     | 394      |
+-----------+----------*
| 60-69     | 1273     |
+-----------+----------*
| 50-59     | 2530     |
+-----------+----------*
| 40-49     | 3298     |
+-----------+----------*
| 30-39     | 15384    |
+-----------+----------*
| 20-29     | 194099   |
+-----------+----------*
| 10-19     | 2244405  |
+-----------+----------*
| 0-9       | 9780789  |
+-----------+----------*

据我所知, Impala 不支持枢轴?
谢谢你的帮助,谢谢

nnvyjq4y

nnvyjq4y1#

使用 case 表达式 group by 密钥:

SELECT (CASE WHEN age >= 80 THEN '>=80',
             WHEN age BETWEEN 70 AND 79 THEN '70-79',
             WHEN age BETWEEN 60 AND 69 THEN '60-69',
             WHEN age BETWEEN 50 AND 59 THEN '50-59',
             WHEN age BETWEEN 40 AND 49 THEN '40-49',
             WHEN age BETWEEN 30 AND 39 THEN '30-39',
             WHEN age BETWEEN 20 AND 30 THEN '20-29',
             WHEN age BETWEEN 10 AND 19 THEN '10-19',
             WHEN age BETWEEN 0 AND 9 THEN '0-9'
        END) as age_group,
      COUNT(*)
FROM a
GROUP BY age_group;

编辑:
这更简单地写为:

SELECT (CASE WHEN age >= 80 THEN '>=80',
             WHEN age >= 70 THEN '70-79',
             WHEN age >= 60 THEN '60-69',
             WHEN age >= 50 THEN '50-59',
             WHEN age >= 40 THEN '40-49',
             WHEN age >= 30 THEN '30-39',
             WHEN age >= 20 THEN '20-29',
             WHEN age >= 10 THEN '10-19',
             WHEN age >= 0 THEN '0-9'
        END) as age_group,
      COUNT(*)
FROM a
GROUP BY age_group;

这个 CASE 逻辑在第一个匹配值处停止。

相关问题