如何使用用户事件在平板上创建用户会话事件

yiytaume  于 2021-07-15  发布在  ClickHouse
关注(0)|答案(1)|浏览(335)

在clickhouse中,我有一个带有用户id和事件的事件列表。我的目标是从这些数据中提取包含事件的会话。
我查看了timeslot函数的文档,它描述了一个与我类似的用例,但我不知道如何编写查询。
(https://clickhouse.yandex/docs/en/query_language/functions/date_time_functions/#timeslots-开始时间(Time-duration-91-size-93)
例如:
启动下一个会话的条件是超时超过30分钟
活动:

date                | user  | event
2018-12-21 00:00:00 │ user1 │ event1
2018-12-21 00:00:00 │ user2 │ event1
2018-12-21 00:15:00 │ user1 │ event2
2018-12-21 00:15:00 │ user2 │ event3
2018-12-21 00:30:00 │ user1 │ event4
2018-12-21 00:45:00 │ user1 │ event1
2018-12-21 01:45:00 │ user1 │ event2

我想要的是:

date                | user  | event  | session_id
2018-12-21 00:00:00 │ user1 │ event1 | user1_1
2018-12-21 00:00:00 │ user2 │ event1 | user2_1
2018-12-21 00:15:00 │ user1 │ event2 | user1_1
2018-12-21 00:15:00 │ user2 │ event3 | user2_1
2018-12-21 00:50:00 │ user1 │ event4 | user1_2
2018-12-21 00:55:00 │ user1 │ event1 | user1_2
2018-12-21 01:55:00 │ user1 │ event2 | user1_3

我尝试使用数组,但无法编写正确的查询。
在这里,我试着找出下一节课开始的日期

select user,
       groupArray(event) as event,
       arrayPopBack(groupArray(date)) as start_time,
       arrayPopFront(groupArray(date)) as finish_time,
       arrayMap(i -> (i + 1800), start) AS timeToNextSession,
       arrayFilter((x,y) -> x > y, finish_time,timeToNextSession) as endSessionTime
FROM user.events
group by user
zdwk9cvp

zdwk9cvp1#

让我们尝试以下查询:

SELECT r.1 date, r.2 user, r.3 event, r.4 session_id, arrayJoin(result) r
FROM (
  SELECT
      user,
      groupArray(event) AS events,
      groupArray(date) AS dates,      
      /* an array of rounded dates */
      arraySort(groupUniqArray(roundedDate)) AS roundedDateArray,
      /* an array of rounded dates shifted to 30 minutes (where 30 min taken from timeSlot-function) */
      arrayMap(i -> (i + 1800), roundedDateArray) AS shiftedRoundedDateArray,
      /* to intersect two arrays to find the dates when sessions start */
      arrayFilter(x -> (has(shiftedRoundedDateArray, x) = 0), roundedDateArray) AS sessionStartDateArray,
      /* get sessions sorted in descending order */
      arrayReverse(sessionStartDateArray) reversedSessionStartDateArray, 
      /* define the session_id  */
      arrayMap((index, sessionStart) -> (concat(toString(user), '_', toString(index)), sessionStart), arrayReverse(arrayEnumerate(reversedSessionStartDateArray)), reversedSessionStartDateArray) sessions,
      /* prepare the result */
      arrayMap(i -> (dates[i], user, events[i], arrayFirst(s -> dates[i] >= s.2, sessions).1), arrayEnumerate(events)) AS result      
  FROM
  (
      SELECT date, timeSlot(date) AS roundedDate, user, event
      FROM
      (
          /* the source data */
          SELECT
              arrayJoin([('2018-12-21 00:00:00', 'user1', 'event1'), ('2018-12-21 00:00:00', 'user2', 'event1'), ('2018-12-21 00:15:00', 'user1', 'event2'), ('2018-12-21 00:15:00', 'user2', 'event3'), ('2018-12-21 00:30:00', 'user1', 'event4'), ('2018-12-21 00:45:00', 'user1', 'event1'), ('2018-12-21 01:45:00', 'user1', 'event2')]) AS a, toDateTime(a.1) AS date, a.2 AS user, a.3 AS event
      )
      ORDER BY user
  )
  GROUP BY user)
ORDER BY user, date  

/* Result
┌────────────────date─┬─user──┬─event──┬─session_id─┬─r──────────────────────────────────────────────────┐
│ 2018-12-21 00:00:00 │ user1 │ event1 │ user1_1    │ ('2018-12-21 00:00:00','user1','event1','user1_1') │
│ 2018-12-21 00:15:00 │ user1 │ event2 │ user1_1    │ ('2018-12-21 00:15:00','user1','event2','user1_1') │
│ 2018-12-21 00:30:00 │ user1 │ event4 │ user1_1    │ ('2018-12-21 00:30:00','user1','event4','user1_1') │
│ 2018-12-21 00:45:00 │ user1 │ event1 │ user1_1    │ ('2018-12-21 00:45:00','user1','event1','user1_1') │
│ 2018-12-21 01:45:00 │ user1 │ event2 │ user1_2    │ ('2018-12-21 01:45:00','user1','event2','user1_2') │
│ 2018-12-21 00:00:00 │ user2 │ event1 │ user2_1    │ ('2018-12-21 00:00:00','user2','event1','user2_1') │
│ 2018-12-21 00:15:00 │ user2 │ event3 │ user2_1    │ ('2018-12-21 00:15:00','user2','event3','user2_1') │
└─────────────────────┴───────┴────────┴────────────┴────────────────────────────────────────────────────┘

* /

使用clickhouse进行会话定义看起来不是一种最佳方式。
在将数据写入ch之前,我会计算服务器端的会话id(客户端的计算会话id也是有效点)。让我们考虑一下这方面的一些问题:
以任何支持存储项的滑动过期策略的存储引擎为例(例如,redis)
项的键是唯一的用户id(或类似的smth)
该项存储会话id
存储到clickhouse之前的任何传入事件都应该从该存储中获取会话id
任何传入事件都可以启动在存储器中创建项(当它不存在时,启动“新会话”)或延长现有项的生存期(也称为会话)。

相关问题