如何对特定的sql列进行分组，并检索那些列计数最高的行？

jgovgodb 于 2021-06-25 发布在 Hive

关注(0)|答案(3)|浏览(359)

我有以下数据：

col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a1      b2      c1      d1
a1      b3      c1      d1
a1      b4      c1      d2
a1      b5      c2      d2
a1      b6      c2      d2
a1      b7      c1      d3
a1      b8      c2      d3
a1      b9      c3      d3
a1      b10     c1      d2
a1      b11     c2      d3
a2      b12     c1      d1
a3      b13     c1      d1

我希望能够：
返回包含 col_1 是独一无二的
对于结果中的每一行，它应返回按以下方式分组时计数最高的列的值： col_3 , col_4 例如，我希望输出返回以下内容：

col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a2      b12     c1      d1
a3      b13     c1      d1

注意结果中的每个值 col_1 是独一无二的。还要注意的是 a1 ，它带着 c1 以及 d1 因为他们的人数最多 a1 .
如何通过sql查询实现这一点？我将使用它进行配置单元sql查询。

sql Hive apache-spark greatest-n-per-group hiveql

来源：https://stackoverflow.com/questions/60805027/how-to-group-specific-sql-columns-and-retrieve-rows-with-highest-counts-for-thos

3条答案

按热度按时间

64jmpszr1#

可以使用聚合和窗口函数：

select col_1, col_2, col_3, col_4
from (
    select
        col_1, 
        col_2, 
        col_3, 
        col_4, 
        rank() over(partition by col_1 order by count(*) desc) rn
    from mytable t
    group by col_1, col_2, col_3, col_4
) t
where rn = 1

赞(0）回复(0）举报 2021-06-26

hzbexzde2#

与 row_number() 窗口功能：

select t.col_1, t.col_2, t.col_3, t.col_4
from (
  select col_1, min(col_2) col_2, col_3, col_4,
    row_number() over (partition by col_1 order by count(*) desc) rn
  from tablename
  group by col_1, col_3, col_4
) t
where t.rn = 1

请看演示。
结果：

| col_1 | col_2 | col_3 | col_4 |
| ----- | ----- | ----- | ----- |
| a1    | b1    | c1    | d1    |
| a2    | b12   | c1    | d1    |
| a3    | b13   | c1    | d1    |

赞(0）回复(0）举报 2021-06-26

7eumitmz3#

如果需要完整的行，可以使用窗口函数：

select t.*
from (select t.*,
             rank() over (partition by col1 order by cnt desc) as seqnum
      from (select t.*, count(*) over (partition by col1, col3, col4) as cnt
            from t
           ) t
     ) t
where seqnum = 1;

最里面的子查询统计每个col1/col3/col4组合的行数。中间的子查询枚举每个子查询中计数最高的行 col1 . 用于最高计数的最外层筛选器。

赞(0）回复(0）举报 2021-06-26

我来回答

如何对特定的sql列进行分组，并检索那些列计数最高的行？

3条答案

相关问题

热门标签

最新问答