over的配置单元中的窗口函数平均值(按colname排序)

mzsu5hc0 于 2021-06-27 发布在 Hive

关注(0)|答案(1)|浏览(257)

我试图理解窗口函数avg是如何工作的，但不知怎么的，它似乎并没有像我期望的那样工作。
以下是数据集：

select * from winsales;
+-------------------+------------------+--------------------+-------------------+---------------+-----------------------+--+
| winsales.salesid  | winsales.dateid  | winsales.sellerid  | winsales.buyerid  | winsales.qty  | winsales.qty_shipped  |
+-------------------+------------------+--------------------+-------------------+---------------+-----------------------+--+
| 30001             | NULL             | 3                  | b                 | 10            | 10                    |
| 10001             | NULL             | 1                  | c                 | 10            | 10                    |
| 10005             | NULL             | 1                  | a                 | 30            | NULL                  |
| 40001             | NULL             | 4                  | a                 | 40            | NULL                  |
| 20001             | NULL             | 2                  | b                 | 20            | 20                    |
| 40005             | NULL             | 4                  | a                 | 10            | 10                    |
| 20002             | NULL             | 2                  | c                 | 20            | 20                    |
| 30003             | NULL             | 3                  | b                 | 15            | NULL                  |
| 30004             | NULL             | 3                  | b                 | 20            | NULL                  |
| 30007             | NULL             | 3                  | c                 | 30            | NULL                  |
| 30001             | NULL             | 3                  | b                 | 10            | 10                    |
+-------------------+------------------+--------------------+-------------------+---------------+-----------------------+--+

当我启动以下查询时->

select salesid, sellerid, qty, avg(qty) over (order by sellerid) as avg_qty from winsales order by sellerid,salesid;

我得到以下信息->

+----------+-----------+------+---------------------+--+
| salesid  | sellerid  | qty  |       avg_qty       |
+----------+-----------+------+---------------------+--+
| 10001    | 1         | 10   | 20.0                |
| 10005    | 1         | 30   | 20.0                |
| 20001    | 2         | 20   | 20.0                |
| 20002    | 2         | 20   | 20.0                |
| 30001    | 3         | 10   | 18.333333333333332  |
| 30001    | 3         | 10   | 18.333333333333332  |
| 30003    | 3         | 15   | 18.333333333333332  |
| 30004    | 3         | 20   | 18.333333333333332  |
| 30007    | 3         | 30   | 18.333333333333332  |
| 40001    | 4         | 40   | 19.545454545454547  |
| 40005    | 4         | 10   | 19.545454545454547  |
+----------+-----------+------+---------------------+--+

问题是-如何计算平均数量。因为我没有使用partitionby，所以我希望所有行的avg（qty）都是相同的。
有什么想法吗？

Hive windowing

来源：https://stackoverflow.com/questions/52804690/windowing-function-avg-in-hive-with-over-order-by-colname

1条答案

按热度按时间

k4aesqcs1#

如果您想获得所有行的相同平均值（数量），请删除 order by sellerid 在over子句中，所有行的值都是19.547。
查询以获得所有行的相同平均值（数量）：

hive> select salesid, sellerid, qty, avg(qty) over () as avg_qty from winsales order by sellerid,salesid;

如果我们包括 order by sellerid 在over子句中，您将得到每个sellerid的累计平均值。i、 e.对于

sellerid 1 you are having 2 records total 2 records with qty as 10,30 so avg would be 
     (10+30)/2.
sellerid 2 you are having 2 records total 4 records with qty as 20,20 so avg would be 
     (10+30+20+20)/4 = 20.0
sellerid 3 you are having 5 records total 9 records with qty as  so 10,10,15,20,30 avg would be 
(10+30+20+20+10+10+15+20+30)/9 = 18.333
sellerid 4 avg is 19.545454545454547

当我们包含over子句时，这是hive的预期行为。

赞(0）回复(0）举报 2021-06-27

我来回答

over的配置单元中的窗口函数平均值(按colname排序)

1条答案

相关问题

热门标签

最新问答