使用impala获取连续行程的计数

dy2hfwbg  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(446)

样本数据

touristid|day
ABC|1
ABC|1
ABC|2
ABC|4
ABC|5
ABC|6
ABC|8
ABC|10

输出应该是

touristid|trip
ABC|4

4后面的逻辑是连续天数不同连续天数的计数sqq 1,1,2是第一个,然后是4,5,6是第二个,然后是8是第三个,10是第四个我想用impala查询输出这个结果

n6lpvg4x

n6lpvg4x1#

使用lag()函数获取前一天,如果前一天>1,则计算新的\u trip \u标志,然后计数(新的\u trip \u标志)。
演示:

with table1 as (
select 'ABC' as touristid, 1  as day union all
select 'ABC' as touristid, 1  as day union all
select 'ABC' as touristid, 2  as day union all
select 'ABC' as touristid, 4  as day union all
select 'ABC' as touristid, 5  as day union all
select 'ABC' as touristid, 6  as day union all
select 'ABC' as touristid, 8  as day union all
select 'ABC' as touristid, 10 as day 
)

select touristid, count(new_trip_flag) trip_cnt
  from 
       ( -- calculate new_trip_flag
         select touristid,
                case when (day-prev_day) > 1 or prev_day is NULL then true end  new_trip_flag
           from       
                ( -- get prev_day
                  select touristid, day, 
                         lag(day) over(partition by touristid order by day) prev_day
                    from table1
                )s
        )s
 group by touristid;

结果:

touristid   trip_cnt    
ABC         4

同样的方法也适用于Hive。

相关问题