GROUPBY—如何在配置单元sql的列中按分布生成一种列

c9qzyr3d 于 2021-06-24 发布在 Hive

关注(0)|答案(1)|浏览(388)

我想在Hive里做出这样的结果：

| COL1 | HISTOGRAM             |
+------+-----------------------+
|  a   | {"A":2, "B":2}        |
|  b   | {"C":2, "A":1, "B":1} |

从此表中：

| COL1 | COL2 |
+------+------+
| a    | A    |
| a    | B    |
| a    | A    |
| a    | B    |
| b    | A    |
| b    | B    |
| b    | C    |
| b    | C    |

presto sql就像我想要的一样，我认为：

select COL1, histogram(COL2)
from sample_table 
group by COL1

sql Hive group-by hiveql Histogram

来源：https://stackoverflow.com/questions/63592898/how-can-i-make-a-kind-of-column-by-distribution-in-a-column-in-hive-sql

1条答案

按热度按时间

ybzsozfc1#

可以按col1、col2计算counts group，然后使用 collect_set 或者 collect_list 在数组中，用逗号作为分隔符连接数组，并使用 str_to_map .
演示：

select stack (8, --number of tuples
 'a','A'
,'a','B'
,'a','A'
,'a','B'
,'b','A'
,'b','B'
,'b','C'
,'b','C'
) as (COL1,COL2)
)

select col1, str_to_map(concat_ws(',',collect_set(concat(col2,':',cnt)))) histogram
from
(
select col1, col2, count(*) cnt from data_example group by col1, col2
)s
group by col1
;

结果：

col1    histogram
a   {"A":"2","B":"2"}
b   {"A":"1","B":"1","C":"2"}

赞(0）回复(0）举报 2021-06-24

我来回答

GROUPBY—如何在配置单元sql的列中按分布生成一种列

1条答案

相关问题

热门标签

最新问答