我有一个包含学生id和年龄的数据集。我想标记应该被安排在一个范围内或桶大小10桶。
stud_id ages
101 11
102 13
103 21
104 25
同样,我有更多的记录日期。这个箱子的尺寸必须为10。
预期输出为:
stud_id ages_bin
101 11-20
102 11-20
103 21-30
104 21-30
我在 hive 里试过简单的病例陈述。
select stud_id,
case when ages between 0 and 10 then '0-10'
when ages between 11 and 20 then '11-20'
when ages between 21 and 30 then '21-30'
when ages between 31 and 40 then '31-40'
when ages between 41 and 50 then '41-50'
when ages between 51 and 60 then '51-60'
when ages between 61 and 70 then '61-70'
when ages between 71 and 80 then '71-80'
when ages between 81 and 90 then '81-90'
when ages between 91 and 100 then '91-100'
when ages between 101 and 110 then '101-110'
when ages between 111 and 120 then '111-120'
when ages between 121 and 130 then '121-130'
when ages between 131 and 140 then '131-140'
when ages between 141 and 150 then '141-150'
else NULL end as ages_bin
from students
有没有什么简单的方法可以让bucket大小为10的bined数据?
有人能帮我写一个简单的代码吗?
2条答案
按热度按时间avkwfej41#
有一个简单的方法来安排直方图的箱子范围。代码如下:
这将产生以下输出:
rhfm7lfc2#
试试这个。这应该能够以bin格式获取bin:
为了能够得到适当的输出,最好将其分组并适当地排序