sql—bigquery中两个不同事件之间的时间差

khbbv19g  于 2021-07-24  发布在  Java
关注(0)|答案(3)|浏览(564)

我试图计算bigquery中两个事件之间的时间差(它们是我们在firebase中设置的两个定制事件)。第一个是事件a,第二个是在事件a之后触发的事件b(无论何时)。
我尝试了以下查询:

SELECT round(AVG(time_diff),2) avg_duration_minutes
FROM(
SELECT user_pseudo_id,        
  CASE WHEN event_name = 'event_a' AND 
 LEAD(event_name,1) OVER(PARTITION BY user_id ORDER BY event_timestamp ASC) = 'event_b'
   THEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(LEAD(event_timestamp, 1) OVER(PARTITION BY user_id ORDER BY event_timestamp ASC)), TIMESTAMP_MICROS(event_timestamp), minute) END time_diff
FROM  `database`
WHERE event_name in ('event_a', 'event_b')
)
where time_diff > 0.2

样本数据:

user_pseudo_id   event      timestamp
aaa              event_a    1587995938387000
bbb              event_a    1590948191239003
aaa              event_b    1587995943075005
ccc              event_a    1589130017650008
aaa              event_a    1593078261900005
aaa              event_b    1593078881226002
bbb              event_b    1590948208425007
ccc              event_b    1589130462706020

我想得到的结果是每个用户在事件a和事件b之间的平均时间和总计时间。
你有什么建议吗?重要的是要知道两个特定事件之间发生了多少时间(无论第二个事件何时发生)。

eyh26e7m

eyh26e7m1#

下面是bigquery标准sql


# standardSQL

SELECT 
  user_pseudo_id, 
  AVG(duration) AS avg_duration, 
  SUM(duration) AS total_duration
FROM (
  SELECT *, LEAD(timestamp) OVER(win) - timestamp AS duration
  FROM `project.dataset.table`
  WHERE event IN ('event_a', 'event_b')
  WINDOW win AS (PARTITION BY user_pseudo_id ORDER BY timestamp)

) 
WHERE event = 'event_a'
GROUP BY user_pseudo_id
ccgok5k5

ccgok5k52#

我会这样回答:

with data as (
  select user_pseudo_id, event_name, event_timestamp from `database` where event_name in ('event_a', 'event_b')
),
ea as (
 -- Get first event_a per user
 select user_pseudo_id, min(event_timestamp) as first_a_ts from data where event_name = 'event_a' group by 1
),
eb as (
 -- Get first event_b per user
 select user_pseudo_id, min(event_timestamp) as first_b_ts from data where event_name = 'event_b' group by 1
),
joined (
  -- Assume we only want to calculate duration if user has an event_b, hence inner join
  select * 
  from ea 
  inner join eb using(user_pseudo_id) 
  where first_b_ts > first_a_ts
)
select 
  avg(timestamp_diff(first_b_ts, first_a_ts, second))/60.0 as avg_duration_minutes
from joined

我没有包括你的 .2 因为我不知道你为什么要任意过滤掉12秒以内的差异。

iqxoj9l9

iqxoj9l93#

如果要在事件a之后获得事件b的时间,可以使用条件累积最小值:

SELECT ab.*
FROM (SELECT user_pseudo_id, event_timestamp as event_a_timestamp,  
             MIN(CASE WHEN event_name = 'event_b' THEN event_timestamp END) OVER 
                 (PARTITION BY user_id 
                  ORDER BY event_timestamp  
                  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                 ) as event_b_timestamp  
     FROM  `database`
     WHERE event_name in ('event_a', 'event_b')
    ) ab
WHERE event_name = 'event_a'

你的问题没有提供足够的细节来决定还需要做什么。

相关问题