无顺序保证的条件超前/滞后

wa7juj8i  于 2021-06-24  发布在  Hive
关注(0)|答案(2)|浏览(396)

如果前面的或进行中的超前/滞后不能保证满足某个条件,那么如何编写条件超前/滞后?就我而言,我关注的是网站流量。
示例数据(prior\u path和prior\u event是在给定条件下无法访问prior\u event的目标字段)

+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| sessionid | hit | type  |                 path                  | event |             prior_path             | prior_event |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
|      1001 |   1 | event | www.stackoverflow.com                 | hover |                                    |             |
|      1001 |   2 | page  | www.stackoverflow.com                 |       |                                    | hover       |
|      1001 |   3 | event | www.stackoverflow.com                 | load  |                                    |             |
|      1001 |   4 | event | www.stackoverflow.com                 | blur  |                                    | load        |
|      1001 |   5 | event | www.stackoverflow.com                 | click |                                    | blur        |
|      1001 |   6 | page  | www.stackoverflow.com/post/10         |       | www.stackoverflow.com              | click       |
|      1001 |   7 | event | www.stackoverflow.com/post/10#details | offer |                                    |             |
|      1001 |   8 | page  | www.stackoverflow.com/post/confirm    |       | www.stackoverflow.com/post/10      | offer       |
|      1001 |   9 | page  | www.stackoverflow.com/questions/10    |       | www.stackoverflow.com/post/confirm | offer       |
|      1001 |  10 | event | www.stackoverflow.com/questions/10    | exit  |                                    |             |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+

prior\u path:最后一个路径,其中type=page仅适用于页面命中类型prior\u event:最后一个事件,其中type=event适用于所有命中类型
注意,对于命中8和命中9,“offer”事件会重复出现,因为它们会导致这些页面。
我能做到的似乎是直截了当的

SELECT LAG(path) OVER (PARTITION BY sessionid, type ORDER BY hit) FROM my_table

但我不知道如何得到事先的事件。

p4tfgftt

p4tfgftt1#

你有正确的表达方式 prior_path 已经有了。你只需要把它 Package 成一个条件表达式。
至于 prior_event ,确实有点复杂。我建议采取以下办法:
对于事件,我们可以使用 lag() 对于页面,一个选项是使用一些间隙和孤岛技术:首先使用每次满足事件时递增的条件和定义组,然后使用 first_value() :
这应该是你想要的:

select  
    t.*,
    case when type = 'page'
        then lag(path) over(partition by sessionid, type  order by hit)
    end prior_path,
    case type 
        when 'page'
            then first_value(event) over(partition by sessionid, grp order by hit)
        when 'event' 
            then lag(event) over(partition by sessionid order by hit)
        end prior_event
from (
    select 
        t.*,
        sum(case when type = 'event' then 1 else 0 end) 
            over(partition by sessionid order by hit) grp
    from mytable t
) t

db-fiddle上的演示(由于野外缺少hive-fiddle,我使用了postgres,但这也适用于hive):

sessionid | hit | type  | path                                  | event | grp | prior_path                         | prior_event
--------: | --: | :---- | :------------------------------------ | :---- | --: | :--------------------------------- | :----------
     1001 |   1 | event | www.stackoverflow.com                 | hover |   1 | null                               | null       
     1001 |   2 | page  | www.stackoverflow.com                 | null  |   1 | null                               | hover      
     1001 |   3 | event | www.stackoverflow.com                 | load  |   2 | null                               | null       
     1001 |   4 | event | www.stackoverflow.com                 | blur  |   3 | null                               | load       
     1001 |   5 | event | www.stackoverflow.com                 | click |   4 | null                               | blur       
     1001 |   6 | page  | www.stackoverflow.com/post/10         | null  |   4 | www.stackoverflow.com              | click      
     1001 |   7 | event | www.stackoverflow.com/post/10#details | offer |   5 | null                               | null       
     1001 |   8 | page  | www.stackoverflow.com/post/confirm    | null  |   5 | www.stackoverflow.com/post/10      | offer      
     1001 |   9 | page  | www.stackoverflow.com/questions/10    | null  |   5 | www.stackoverflow.com/post/confirm | offer      
     1001 |  10 | event | www.stackoverflow.com/questions/10    | exit  |   6 | null                               | null
laik7k3q

laik7k3q2#

我想你只需要 lag() 还有一些条件逻辑:

select . . .,
       (case when type = 'page'
             then lag(path) over (partition by sessionid, type order by hit)
        end) as prior_path,
       lag(event) over (partition by sessionid order by hit) as prior_event
from my_table;

相关问题