sql—配置单元中分组依据之后的分区

abithluo 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(661)

假设有一个包含一些数据的表和一个包含日期的列：

column1, column2, date
a, a, 2016
a, b, 2016
a, c, 2017
b, d, 2017
b, e, 2017

这种情况是为每列1计算第2列出现次数，并为每列1应用最小日期。
第一部分是简单的分组。第二个可以通过partitionby子句获得。但我如何才能巧妙而清晰地将这两者结合起来呢？是否真的需要分区来获取最小日期？任何明智的建议都太好了！
预期产量：

column1, count, min_date
a, 3, 2016
b, 2, 2017

sql Hive

来源：https://stackoverflow.com/questions/46580637/partition-after-a-group-by-in-hive

1条答案

按热度按时间

mwkjh3gx1#

简单 group by :

select column1, 
       count(distinct column2) count, --remove distinct if you need count not null column2 by column1
                                      --use count(*) if you need count all rows by column1
       min(date)               min_date
from table
group by column1

我们来测试一下：

select column1, 
       count(distinct column2) count, --remove distinct if you need count not null column2 by column1
                                      --use count(*) if you need count all rows by column1
       min(date)               min_date
from (
select 
stack(6,
'a','a', 2016, 
'a','b', 2016, 
'a','c', 2017, 
'b','d', 2017, 
'b','e', 2017, 
'c','e', 2015) as( column1, column2, date)
) s
group by column1

结果：

a   3   2016    
b   2   2017    
c   1   2015

请注意，minu date为每个column1值选择了最小值。

赞(0）回复(0）举报 2021-06-26

我来回答

sql—配置单元中分组依据之后的分区

1条答案

相关问题

热门标签

最新问答