sql-从其他字段中查找最新值(每个用户,出现之间)

gstyhher  于 2021-07-24  发布在  Java
关注(0)|答案(1)|浏览(241)

我把行动和结果混合在一个“事件”字段中。但是每一个行为本质上都“导致”了结果(以及随后的结果)(让我们假设)。我想为每个用户将每个未来的结果与导致它的操作联系起来。
我希望将“outcome”字段中的值复制到每个用户最新的未来结果的新字段中(如果在该操作之后出现了结果)。我想记录下结果的时间。例如,如果用户\u 1做了三个事件,然后有了一个好的结果,我希望前面三个事件中的每一个都有一个新字段中的“good”。如果用户\u 2有2个操作,则结果不好,然后有3个操作:我希望在事件\u结果字段中,前2个操作有“坏”,后3个操作有“?”。

  • 注意:事件\时间戳字段并不总是按时间顺序出现(与行号有关)。我已经展示了这种方式,使它更容易理解。

我只有这个*

event_timestamp             |   user|   event       |   outcome
2020-07-20 15:00:00.000 UTC |   1   |   action-throw|   
2020-07-20 15:01:00.000 UTC |   1   |   result      |   good
2020-07-20 15:02:00.000 UTC |   1   |   action-push |   
2020-07-20 15:03:00.000 UTC |   2   |   action-run  |   
2020-07-20 15:04:00.000 UTC |   2   |   result      |   bad
2020-07-20 15:05:00.000 UTC |   2   |   action-throw|   
2020-07-20 15:06:00.000 UTC |   1   |   action-push |   
2020-07-20 15:07:00.000 UTC |   1   |   result      |   bad
2020-07-20 15:08:00.000 UTC |   1   |   action-push |   
2020-07-20 15:09:00.000 UTC |   2   |   result      |   good
2020-07-20 15:10:00.000 UTC |   2   |   action-pull |   
2020-07-20 15:11:00.000 UTC |   2   |   action-push |

我想要的是这个

event_timestamp             |   user|   event       |   outcome |   event_outcome   |   event_outcome_timestamp
2020-07-20 15:00:00.000 UTC |   1   |   action-throw|           |   good            |   2020-07-20 15:01:00.000 UTC
2020-07-20 15:01:00.000 UTC |   1   |   result      |   good    |   good            |   2020-07-20 15:01:00.000 UTC
2020-07-20 15:02:00.000 UTC |   1   |   action-push |           |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:03:00.000 UTC |   2   |   action-run  |           |   bad             |   2020-07-20 15:04:00.000 UTC
2020-07-20 15:04:00.000 UTC |   2   |   result      |   bad     |   bad             |   2020-07-20 15:04:00.000 UTC
2020-07-20 15:05:00.000 UTC |   2   |   action-throw|           |   good            |   2020-07-20 15:09:00.000 UTC
2020-07-20 15:06:00.000 UTC |   1   |   action-push |           |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:07:00.000 UTC |   1   |   result      |   bad     |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:08:00.000 UTC |   1   |   action-push |           |   ?               |   ?
2020-07-20 15:09:00.000 UTC |   2   |   result      |   good    |   good            |   2020-07-20 15:09:00.000 UTC
2020-07-20 15:10:00.000 UTC |   2   |   action-pull |           |   ?               |   ?
2020-07-20 15:11:00.000 UTC |   2   |   action-push |           |   ?               |   ?

我可以为每个用户找到最新的结果(如果不知道结果,最好是“?”而不是null)。使用此代码:

select *
from (
    select *
    from table
    where event_timestamp > x
    ) temp1
LEFT JOIN
(select user as user2, outcome as outcome_latest, event_timestamp as event_timestamp_latest_outcome
from(
    select user, event_timestamp, outcome
    row_number() over (partition by user order by UNIXMILLIS(event_timestamp) desc) as rn
    from table
    where event_timestamp > x
    and outcome is not null
    )
where rn = 1
) temp2
on temp1.user = temp2.user2

这是什么(不是我想要的)

event_timestamp             |   user|   event       |   outcome |   outcome_latest  |   event_timestamp_latest_outcome
2020-07-20 15:00:00.000 UTC |   1   |   action-throw|           |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:01:00.000 UTC |   1   |   result      |   good    |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:02:00.000 UTC |   1   |   action-push |           |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:03:00.000 UTC |   2   |   action-run  |           |   good            |   2020-07-20 15:09:00.000 UTC
2020-07-20 15:04:00.000 UTC |   2   |   result      |   bad     |   good            |   2020-07-20 15:09:00.000 UTC
2020-07-20 15:05:00.000 UTC |   2   |   action-throw|           |   good            |   2020-07-20 15:09:00.000 UTC
2020-07-20 15:06:00.000 UTC |   1   |   action-push |           |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:07:00.000 UTC |   1   |   result      |   bad     |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:08:00.000 UTC |   1   |   action-push |           |   bad             |   2020-07-20 15:07:00.000 UTC
2020-07-20 15:09:00.000 UTC |   2   |   result      |   good    |   good            |   2020-07-20 15:09:00.000 UTC
2020-07-20 15:10:00.000 UTC |   2   |   action-pull |           |   good            |   2020-07-20 15:09:00.000 UTC
2020-07-20 15:11:00.000 UTC |   2   |   action-push |           |   good            |   2020-07-20 15:09:00.000 UTC

但我不知道接下来该怎么办。可能是另一个嵌套级别的“分区方式”?在事件域上?。。。
如果上下文有帮助的话,我们使用的是bigquery。谢谢您!

4nkexdtk

4nkexdtk1#

下面是我如何在常规sql中实现它。我假设大查询支持我使用的分析函数。

with data
 as (
select *
      ,case when (lag(outcome) over(partition by user1 order by event_timestamp) <> ' '
                  and outcome =' '
                 )/*Check if there has been a change in outcome by user*/
             OR (outcome=' ' 
                 and row_number() over(partition by user1 order by event_timestamp)=1
                )/*for the first record per user, we consider it to be a new group*/
            then
                row_number() over(partition by user1 order by event_timestamp)
            else ' '
        end as grp_val
  from t
     )
    ,computed_grp_val
     as(select *
               ,sum(case when grp_val =' ' then 0 else 1 end) over(partition by user1 order by event_timestamp) as grp_val_1
         from data
        )
select *
      ,case when max(outcome) over(partition by user1,grp_val_1)=' ' then '?'
            else max(outcome) over(partition by user1,grp_val_1)
        end as comptued_outcome
      ,max(event_timestamp) over(partition by user1,grp_val_1) as computed_event_timestamp
  from computed_grp_val
order by 1  

+-----------------------------+-------+--------------+---------+---------+-----------+------------------+-----------------------------+
|       event_timestamp       | user1 |    event     | outcome | grp_val | grp_val_1 | comptued_outcome |  computed_event_timestamp   |
+-----------------------------+-------+--------------+---------+---------+-----------+------------------+-----------------------------+
| 2020-07-20 15:00:00.0000000 |     1 | action-throw |         |       1 |         1 | good             | 2020-07-20 15:01:00.0000000 |
| 2020-07-20 15:01:00.0000000 |     1 | result       | good    |       0 |         1 | good             | 2020-07-20 15:01:00.0000000 |
| 2020-07-20 15:02:00.0000000 |     1 | action-push  |         |       3 |         2 | bad              | 2020-07-20 15:07:00.0000000 |
| 2020-07-20 15:03:00.0000000 |     2 | action-run   |         |       1 |         1 | bad              | 2020-07-20 15:04:00.0000000 |
| 2020-07-20 15:04:00.0000000 |     2 | result       | bad     |       0 |         1 | bad              | 2020-07-20 15:04:00.0000000 |
| 2020-07-20 15:05:00.0000000 |     2 | action-throw |         |       3 |         2 | good             | 2020-07-20 15:09:00.0000000 |
| 2020-07-20 15:06:00.0000000 |     1 | action-push  |         |       0 |         2 | bad              | 2020-07-20 15:07:00.0000000 |
| 2020-07-20 15:07:00.0000000 |     1 | result       | bad     |       0 |         2 | bad              | 2020-07-20 15:07:00.0000000 |
| 2020-07-20 15:08:00.0000000 |     1 | action-push  |         |       6 |         3 | ?                | 2020-07-20 15:08:00.0000000 |
| 2020-07-20 15:09:00.0000000 |     2 | result       | good    |       0 |         2 | good             | 2020-07-20 15:09:00.0000000 |
| 2020-07-20 15:10:00.0000000 |     2 | action-pull  |         |       5 |         3 | ?                | 2020-07-20 15:11:00.0000000 |
| 2020-07-20 15:11:00.0000000 |     2 | action-push  |         |       0 |         3 | ?                | 2020-07-20 15:11:00.0000000 |
+-----------------------------+-------+--------------+---------+---------+-----------+------------------+-----------------------------+

db小提琴链接
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=e97ebf473ccf3c33d7c6fa62fd14e51b

相关问题