我有两张table:活动和会议。
活动:
+-----------+---------------------+------+------------+
| event_id | timestamp | flag | session_id |
+-----------+---------------------+------+------------+
| kj123123j | 2020-01-01 22:51:11 | 0 | 1 |
| j24hjk234 | 2020-01-01 21:11:00 | 0 | 1 |
| kjh234khj | 2020-01-01 21:44:17 | 1 | 1 |
| 342hj24j3 | 2020-01-01 08:11:00 | 0 | 2 |
| kk1k12323 | 2020-01-01 13:55:12 | 1 | 2 |
| 890fd8sdf | 2020-01-01 20:55:14 | 0 | 2 |
+-----------+---------------------+------+------------+
会议:
+------------+---------+
| session_id | user_id |
+------------+---------+
| 1 | 12kk |
| 2 | 44qj |
+------------+---------+
我想要得到的是一个表,它统计每个用户在标志发生之前的事件。
+---------+-------+
| user_id | count |
+---------+-------+
| 12kk | 1 |
| 44qj | 1 |
+---------+-------+
我尝试了两种方法:
self-join表,我无法测试它,因为它非常慢(events表非常大)
2.
WITH
events AS (
SELECT
events.event_id,
events.timestamp,
events.user_id
FROM
db.events events
LEFT JOIN
db.users users
ON
events.session_id = users.session_id),
flags AS (
SELECT
events.event_id,
events.timestamp
FROM
db.events events
WHERE
events.flag is TRUE )
SELECT
events.user_id,
SUM(CASE
WHEN events.timestamp < flags.timestamp THEN 1
ELSE
0
END
)
FROM
flags
JOIN
events
ON
events.event_id = flags.event_id
GROUP BY
events.user_id
第二种方法的问题是count列只有0s,这是绝对不可能的。
能帮我解决这个问题吗?
1条答案
按热度按时间hyrbngr71#
一种方法使用窗口函数和聚合。但是,不清楚您想要的是每个会话的计数还是每个用户的计数。每个会话的逻辑如下:
以上统计每个会话的标志之前的事件。如果希望每个用户的所有会话的计数,请更改
min()
收件人: