hivesql内部连接-如何在同一查询中获取sum和row\u num

mfuanj7w  于 2021-06-27  发布在  Hive
关注(0)|答案(2)|浏览(365)

我有两个表product和psales,表中的数据如下所示

select * from psales;
+-------------+---------------+--+
| psales.pid  | psales.sales  |
+-------------+---------------+--+
| 1           | 100           |
| 1           | 150           |
| 1           | 200           |
| 2           | 75            |
| 2           | 45            |
| 2           | 145           |
| 3           | 176           |
| 3           | 99            |
| 1           | 27            |
| 4           | 51            |
+-------------+---------------+--+

select * from product;
+--------------+----------------+--+
| product.pid  | product.pname  |
+--------------+----------------+--+
| 1            | p1             |
| 2            | p2             |
| 3            | p3             |
| 4            | p4             |
+--------------+----------------+--+

目标是获得综合销售额第二高的产品。
这是我目前用来获得最高综合销售额的产品的查询(运行良好)

select p1.pname, p1.total_sales
from (select p.pid as pid, p.pname as pname, s.sales as sales,  
      sum(s.sales) over (partition by p.pid order by p.pid) as total_sales 
      from product p
      inner join psales s on (p.pid = s.pid) 
      order by total_sales desc) p1 
limit 1;

如何获得综合销售额第二高的产品?
当我尝试在内部查询中获取行\ num时,它给出如下错误:

select p1.pname as pname, p1.total_sales as total_sales, row_num() over (partition by pname order by pname) as rownum 
from (select p.pid as pid, p.pname as pname, s.sales as sales,  
      sum(s.sales) over (partition by p.pid order by p.pid) as total_sales, 
      row_num() over (partition by p.pid) as rownum 
      from product p 
      inner join psales s on (p.pid = s.pid) 
      order by total_sales desc) p1 
where rownum =2;

错误:编译语句时出错:失败:semanticexception无法将窗口调用拆分为组。至少有一个组只能依赖于输入列。还要检查循环依赖关系。基本错误:函数行数无效(状态=42000,代码=40000)
提前谢谢你的帮助。

3pvhb19x

3pvhb19x1#

你可以用 dense_rank 要对每个pid的和进行排序,请命名组合。

select p1.pname,p1.pid,p1.total_sales
from (select p.pid, p.pname,sum(s.sales) as total_sales,
      dense_rank() over(order by sum(s.sales) desc) as rnk 
      from product p 
      join psales s on p.pid = s.pid
      group by p.pid,p.pname
     ) p1
where rnk=2
t8e9dugd

t8e9dugd2#

使用 row_number() 函数。看来你不需要分析 sum() ,简单 group by 将做:

select p1.pname, p1.pid, p1.total_sales
  from
(
select p1.pname, p1.pid, p1.total_sales, 
       row_number() over (order by total_sales  desc) rn
  from 
     (select p.pid, p.pname, sum(s.sales) as total_sales 
        from product p 
             inner join psales s on p.pid = s.pid
        group by p.pid, p.pname
     )p1
)s
where rn=2
;

使用 dense_rank() 而不是 row_number() 如果要选择具有相同销售额的所有产品。

相关问题