GROUPBY—如何在配置单元sql的列中按分布生成一种列

c9qzyr3d  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(350)

我想在Hive里做出这样的结果:

| COL1 | HISTOGRAM             |
+------+-----------------------+
|  a   | {"A":2, "B":2}        |
|  b   | {"C":2, "A":1, "B":1} |

从此表中:

| COL1 | COL2 |
+------+------+
| a    | A    |
| a    | B    |
| a    | A    |
| a    | B    |
| b    | A    |
| b    | B    |
| b    | C    |
| b    | C    |

presto sql就像我想要的一样,我认为:

select COL1, histogram(COL2)
from sample_table 
group by COL1
ybzsozfc

ybzsozfc1#

可以按col1、col2计算counts group,然后使用 collect_set 或者 collect_list 在数组中,用逗号作为分隔符连接数组,并使用 str_to_map .
演示:

select stack (8, --number of tuples
 'a','A'
,'a','B'
,'a','A'
,'a','B'
,'b','A'
,'b','B'
,'b','C'
,'b','C'
) as (COL1,COL2)
)

select col1, str_to_map(concat_ws(',',collect_set(concat(col2,':',cnt)))) histogram
from
(
select col1, col2, count(*) cnt from data_example group by col1, col2
)s
group by col1
;

结果:

col1    histogram
a   {"A":"2","B":"2"}
b   {"A":"1","B":"1","C":"2"}

相关问题