我把行动和结果混合在一个“事件”字段中。但是每一个行为本质上都“导致”了结果(以及随后的结果)(让我们假设)。我想为每个用户将每个未来的结果与导致它的操作联系起来。
我希望将“outcome”字段中的值复制到每个用户最新的未来结果的新字段中(如果在该操作之后出现了结果)。我想记录下结果的时间。例如,如果用户\u 1做了三个事件,然后有了一个好的结果,我希望前面三个事件中的每一个都有一个新字段中的“good”。如果用户\u 2有2个操作,则结果不好,然后有3个操作:我希望在事件\u结果字段中,前2个操作有“坏”,后3个操作有“?”。
- 注意:事件\时间戳字段并不总是按时间顺序出现(与行号有关)。我已经展示了这种方式,使它更容易理解。
我只有这个*
event_timestamp | user| event | outcome
2020-07-20 15:00:00.000 UTC | 1 | action-throw|
2020-07-20 15:01:00.000 UTC | 1 | result | good
2020-07-20 15:02:00.000 UTC | 1 | action-push |
2020-07-20 15:03:00.000 UTC | 2 | action-run |
2020-07-20 15:04:00.000 UTC | 2 | result | bad
2020-07-20 15:05:00.000 UTC | 2 | action-throw|
2020-07-20 15:06:00.000 UTC | 1 | action-push |
2020-07-20 15:07:00.000 UTC | 1 | result | bad
2020-07-20 15:08:00.000 UTC | 1 | action-push |
2020-07-20 15:09:00.000 UTC | 2 | result | good
2020-07-20 15:10:00.000 UTC | 2 | action-pull |
2020-07-20 15:11:00.000 UTC | 2 | action-push |
我想要的是这个
event_timestamp | user| event | outcome | event_outcome | event_outcome_timestamp
2020-07-20 15:00:00.000 UTC | 1 | action-throw| | good | 2020-07-20 15:01:00.000 UTC
2020-07-20 15:01:00.000 UTC | 1 | result | good | good | 2020-07-20 15:01:00.000 UTC
2020-07-20 15:02:00.000 UTC | 1 | action-push | | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:03:00.000 UTC | 2 | action-run | | bad | 2020-07-20 15:04:00.000 UTC
2020-07-20 15:04:00.000 UTC | 2 | result | bad | bad | 2020-07-20 15:04:00.000 UTC
2020-07-20 15:05:00.000 UTC | 2 | action-throw| | good | 2020-07-20 15:09:00.000 UTC
2020-07-20 15:06:00.000 UTC | 1 | action-push | | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:07:00.000 UTC | 1 | result | bad | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:08:00.000 UTC | 1 | action-push | | ? | ?
2020-07-20 15:09:00.000 UTC | 2 | result | good | good | 2020-07-20 15:09:00.000 UTC
2020-07-20 15:10:00.000 UTC | 2 | action-pull | | ? | ?
2020-07-20 15:11:00.000 UTC | 2 | action-push | | ? | ?
我可以为每个用户找到最新的结果(如果不知道结果,最好是“?”而不是null)。使用此代码:
select *
from (
select *
from table
where event_timestamp > x
) temp1
LEFT JOIN
(select user as user2, outcome as outcome_latest, event_timestamp as event_timestamp_latest_outcome
from(
select user, event_timestamp, outcome
row_number() over (partition by user order by UNIXMILLIS(event_timestamp) desc) as rn
from table
where event_timestamp > x
and outcome is not null
)
where rn = 1
) temp2
on temp1.user = temp2.user2
这是什么(不是我想要的)
event_timestamp | user| event | outcome | outcome_latest | event_timestamp_latest_outcome
2020-07-20 15:00:00.000 UTC | 1 | action-throw| | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:01:00.000 UTC | 1 | result | good | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:02:00.000 UTC | 1 | action-push | | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:03:00.000 UTC | 2 | action-run | | good | 2020-07-20 15:09:00.000 UTC
2020-07-20 15:04:00.000 UTC | 2 | result | bad | good | 2020-07-20 15:09:00.000 UTC
2020-07-20 15:05:00.000 UTC | 2 | action-throw| | good | 2020-07-20 15:09:00.000 UTC
2020-07-20 15:06:00.000 UTC | 1 | action-push | | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:07:00.000 UTC | 1 | result | bad | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:08:00.000 UTC | 1 | action-push | | bad | 2020-07-20 15:07:00.000 UTC
2020-07-20 15:09:00.000 UTC | 2 | result | good | good | 2020-07-20 15:09:00.000 UTC
2020-07-20 15:10:00.000 UTC | 2 | action-pull | | good | 2020-07-20 15:09:00.000 UTC
2020-07-20 15:11:00.000 UTC | 2 | action-push | | good | 2020-07-20 15:09:00.000 UTC
但我不知道接下来该怎么办。可能是另一个嵌套级别的“分区方式”?在事件域上?。。。
如果上下文有帮助的话,我们使用的是bigquery。谢谢您!
1条答案
按热度按时间4nkexdtk1#
下面是我如何在常规sql中实现它。我假设大查询支持我使用的分析函数。
db小提琴链接
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=e97ebf473ccf3c33d7c6fa62fd14e51b