增量case语句-队列分析

wj8zmpe1  于 2021-07-24  发布在  Java
关注(0)|答案(1)|浏览(294)

我的主要目标是快速了解3090180180天期间用户的平均收入。我有一封电子邮件,他们加入某个团体的日期,还有收入日期

create temporary table cohorts (
  email varchar(64)
, start_date timestamp
, purchase_date timestamp
, amount decimal(10,2)
)
;

insert into cohorts 
values 
  ('johnsmith@domain.com', '2020-01-01 00:00:00', '2020-01-01 12:00:00', '200.00')
, ('happyday@domain.com', '2020-01-01 00:00:00', '2020-02-28 00:00:00','100.00')
, ('happyday@domain.com', '2020-01-01 00:00:00', '2020-01-28 00:00:00','100.00')
, ('susieq@domain.com', '2020-01-01 00:00:00', '2020-05-01 00:00:00', '50.00')
, ('janedoe@domain.com', '2020-01-01 00:00:00', '2020-03-30 00:00:00', '75.00')
, ('janedoe@domain.com', '2020-01-01 00:00:00', '2020-07-30 00:00:00', '75.00')
;

如果我想看到某个时间段内用户的平均收入,我会这样写:

select 
case 
    when datediff(day,start_date, purchase_date) < 30 then 'Within 30' 
    when datediff(day,start_date, purchase_date) < 90 then 'Within 90' 
    when datediff(day,start_date, purchase_date) < 180 then 'Within 180' 
    else 'Older than 180' 
end as cohort_flag 
, count(distinct email) num_of_emails 
, sum(amount) summed_amt 
, sum(amount)/count(distinct email) as avg_value 
from cohorts
group by 1 

  cohort_flag  num_of_emails  summed_amt  avg_value
     Within 30            2       300.0      150.0
     Within 90            2       175.0       87.5
    Within 180            1        50.0       50.0
Older than 180            1        75.0       75.0

然而,由于case语句解析为第一个true子句,因此它不包括来自早期“队列”的收入。我期望的结果如下,早期队列中的用户是其他人的一部分:

cohort_flag  num_of_emails  summed_amt  avg_value
     Within 30            2       300.0      150.0
     Within 90            3       475.0      158.33
    Within 180            4       525.0      131.25
Older than 180            4       600.0      150.0
twh00eeo

twh00eeo1#

必须对多个组使用表的同一行,因此需要如下查询:

select 30 days_dif, 'Within 30' cohort_flag union all
select 90, 'Within 90' union all
select 180, 'Within 180' union all
select 2147483647, 'Older than 180'

(我希望语法得到redshift的支持)
它定义了组,然后 LEFT 将其加入表:

select t.cohort_flag, 
       count(distinct c.email) num_of_emails,
       coalesce(sum(c.amount), 0) summed_amt, 
       coalesce(sum(c.amount), 0) / nullif(count(distinct c.email), 0) as avg_value 
from (
  select 30 days_dif, 'Within 30' cohort_flag union all
  select 90, 'Within 90' union all
  select 180, 'Within 180' union all
  select 2147483647, 'Older than 180'
) t left join cohorts c
on datediff(day, c.start_date, c.purchase_date) < t.days_dif
group by t.days_dif, t.cohort_flag
order by t.days_dif

相关问题