sql—计算在固定列中具有特定值的同一表中固定行之前具有时间戳的行

rks48beu  于 2021-08-09  发布在  Java
关注(0)|答案(1)|浏览(287)

我有两张table:活动和会议。
活动:

+-----------+---------------------+------+------------+
| event_id  |      timestamp      | flag | session_id |
+-----------+---------------------+------+------------+
| kj123123j | 2020-01-01 22:51:11 |    0 |          1 |
| j24hjk234 | 2020-01-01 21:11:00 |    0 |          1 |
| kjh234khj | 2020-01-01 21:44:17 |    1 |          1 |
| 342hj24j3 | 2020-01-01 08:11:00 |    0 |          2 |
| kk1k12323 | 2020-01-01 13:55:12 |    1 |          2 |
| 890fd8sdf | 2020-01-01 20:55:14 |    0 |          2 |
+-----------+---------------------+------+------------+

会议:

+------------+---------+
| session_id | user_id |
+------------+---------+
|          1 | 12kk    |
|          2 | 44qj    |
+------------+---------+

我想要得到的是一个表,它统计每个用户在标志发生之前的事件。

+---------+-------+
| user_id | count |
+---------+-------+
| 12kk    |     1 |
| 44qj    |     1 |
+---------+-------+

我尝试了两种方法:
self-join表,我无法测试它,因为它非常慢(events表非常大)
2.

WITH
  events AS (
  SELECT
    events.event_id,
    events.timestamp,
    events.user_id
  FROM
    db.events events
  LEFT JOIN
    db.users users
  ON
    events.session_id = users.session_id),
  flags AS (
  SELECT
    events.event_id,
    events.timestamp
  FROM
    db.events events
  WHERE
   events.flag is TRUE )
SELECT
  events.user_id,
  SUM(CASE
      WHEN events.timestamp < flags.timestamp THEN 1
    ELSE
    0
  END
    )
FROM
  flags
JOIN
  events
ON
  events.event_id = flags.event_id
GROUP BY
  events.user_id

第二种方法的问题是count列只有0s,这是绝对不可能的。
能帮我解决这个问题吗?

hyrbngr7

hyrbngr71#

一种方法使用窗口函数和聚合。但是,不清楚您想要的是每个会话的计数还是每个用户的计数。每个会话的逻辑如下:

select user_id, countif(timestamp < timestamp_1)
from (select e.*, s.user_id
             min(case when e.flag = 1 then e.timestamp end) over (partition by e.session_id) as timestamp_1
      from events e join
           sessions s
           on s.session_id = e.session_id
     ) e
group by user_id;

以上统计每个会话的标志之前的事件。如果希望每个用户的所有会话的计数,请更改 min() 收件人:

min(case when e.flag = 1 then e.timestamp end) over (partition by s.user_id) as timestamp_1

相关问题