我的主要目标是快速了解3090180180天期间用户的平均收入。我有一封电子邮件,他们加入某个团体的日期,还有收入日期
create temporary table cohorts (
email varchar(64)
, start_date timestamp
, purchase_date timestamp
, amount decimal(10,2)
)
;
insert into cohorts
values
('johnsmith@domain.com', '2020-01-01 00:00:00', '2020-01-01 12:00:00', '200.00')
, ('happyday@domain.com', '2020-01-01 00:00:00', '2020-02-28 00:00:00','100.00')
, ('happyday@domain.com', '2020-01-01 00:00:00', '2020-01-28 00:00:00','100.00')
, ('susieq@domain.com', '2020-01-01 00:00:00', '2020-05-01 00:00:00', '50.00')
, ('janedoe@domain.com', '2020-01-01 00:00:00', '2020-03-30 00:00:00', '75.00')
, ('janedoe@domain.com', '2020-01-01 00:00:00', '2020-07-30 00:00:00', '75.00')
;
如果我想看到某个时间段内用户的平均收入,我会这样写:
select
case
when datediff(day,start_date, purchase_date) < 30 then 'Within 30'
when datediff(day,start_date, purchase_date) < 90 then 'Within 90'
when datediff(day,start_date, purchase_date) < 180 then 'Within 180'
else 'Older than 180'
end as cohort_flag
, count(distinct email) num_of_emails
, sum(amount) summed_amt
, sum(amount)/count(distinct email) as avg_value
from cohorts
group by 1
cohort_flag num_of_emails summed_amt avg_value
Within 30 2 300.0 150.0
Within 90 2 175.0 87.5
Within 180 1 50.0 50.0
Older than 180 1 75.0 75.0
然而,由于case语句解析为第一个true子句,因此它不包括来自早期“队列”的收入。我期望的结果如下,早期队列中的用户是其他人的一部分:
cohort_flag num_of_emails summed_amt avg_value
Within 30 2 300.0 150.0
Within 90 3 475.0 158.33
Within 180 4 525.0 131.25
Older than 180 4 600.0 150.0
1条答案
按热度按时间twh00eeo1#
必须对多个组使用表的同一行,因此需要如下查询:
(我希望语法得到redshift的支持)
它定义了组,然后
LEFT
将其加入表: