如何在hive中实现百分位

z6psavjg 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(541)

我在 hive 里有一张这样的table

user_id     no.of game_plays
u1           52
u2           190
u10          166
u9           100
u3           90
u4           44
u5           21
u7           10
u8           5

以上只是一个非常小的数据样本。
所以，你玩的游戏总数是678
我想计算每个组中的用户，如下所示

who contribute to top 33.3% of total game_plays and 
who contribute to between 33.3% and 66.6% of total game_plays 
who contribute to bottom 33.3% of total game_plays

基本上，将数据分成3组，如上图所示，从每组中选出前20名用户。
我知道如何在bigquery中实现的逻辑，比如….获取按游戏顺序排列的百分位值，然后在上面的查询中放入case语句，并在每组中使用游戏进行排名，然后选择rank<=20
这就是我想要的结果。
我不知道如何在Hive里实现这种东西。
我看了下面的几页，但不知道怎么写
如何在hive中实现百分位数？
Hive中位数的计算
并通过下面的函数链接，
https://cwiki.apache.org/confluence/display/hive/languagemanual+types
我知道我必须使用百分位函数…但我确定我是如何实现的。
下面是我试过的代码，

select a.user_id,a.game_plays, percentile(a.game_plays,0.66) as percentile
from (
select user_id, sum(game_plays) as game_plays
from game_play_table
where data_date = '2019-06-01' 
group by user_id) a

我知道上面的代码没有给出确切的give输出，但是在上面写了一个外部查询之后……我可以得到我想要的输出……但是上面的查询输出本身是非常不同的。
有人能帮忙吗？？？

hadoop Hive Percentile

来源：https://stackoverflow.com/questions/56920385/how-to-implement-percentile-in-hive

1条答案

按热度按时间

w6mmgewl1#

你可以用“case”来计算百分位数

select user_id,game_plays ,
case when (game_plays * (100 /678)) > 33.3 then 'top 33.3%'
when (game_plays * (100 /678)) > 33.3) and (game_plays * (100 /678)) < 66.6) then 'between 33.3% and 66.6%'
when (game_plays * (100 /678)) < 33.3) then 'less then 33.3%'
end as percentile 
from game_play_table
where data_date = '2019-06-01' 
group by user_id

赞(0）回复(0）举报 2021-05-29

我来回答

如何在hive中实现百分位

1条答案

相关问题

热门标签

最新问答