使用sql计算7天保留期

6jjcrrmo  于 2021-07-26  发布在  Java
关注(0)|答案(1)|浏览(412)

根据下表,

users                                             page_views
+-----------------+-----------+                  +----------+-----------+
| user_id                 |varchar| <----+       | pv_id     | varchar   |
| reg_ts                  |timestamp|            | pv_ts     | timestamp |
| reg_device              |varchar|       +----> | user_id   | varchar   |
| mktg_channel            |varchar|              | url       | varchar   |
+-----------------+-----------+                  | device    | varchar   |
                                                 +----------+-----------+

表“users”中每个注册用户有一行。
“页面视图”表中每个页面视图事件有一行。
在某一天首次访问的用户中,有多少%的人会在一周后再次访问?
我目前正在使用sqllite并创建了一个示例数据库,但我的输出已关闭。。。
以下是我目前的情况:

-- day 1 active users
SELECT *
FROM page_views
LEFT JOIN page_views AS future_page_views 
ON page_views.user_id = future_page_views.user_id
AND page_views.pv_ts = future_page_views.pv_ts - datetime(future_page_views.pv_ts, '+7 day')

-- day 7 retained users
SELECT 
  future_page_views.pv_ts,
  COUNT(DISTINCT page_views.user_id) as active_users,
  COUNT(DISTINCT future_page_views.user_id) as retained_users,
  CAST(COUNT(DISTINCT future_page_views.user_id) / COUNT(DISTINCT page_views.user_id) AS float) retention
FROM page_views
LEFT JOIN page_views as future_page_views 
ON page_views.user_id = future_page_views.user_id
AND page_views.pv_ts = future_page_views.pv_ts - datetime(page_views.pv_ts, '+7 day')
GROUP BY 1

不确定是否应该在这个示例中使用strftime函数(datediff)来捕获7天。欢迎任何建议和反馈,提前感谢。
根据下面的数据集,编辑下面的**示例数据,
我希望只有用户id(8)显示为保留7天(第一天2020-01-02)(最后一天2020-01-09)


期望输出:
用户id
p、 作为第一天
f、 作为最后一天
保留天数(即1、2、3、4、5天…)
%第7天访问并返回的用户数

alen0pnh

alen0pnh1#

你可以只看前两页的访问,然后汇总。这给

select user_id, min(pv_ts) as first_ts,
       nullif(max(pv_ts), min(pv_ts)) as second_ts
from (select pv.*,
             row_number() over (partition by user_id order by pv_ts) as seqnum
      from page_views pv
     ) pv
where seqnum <= 2
group by user_id;

然后得到总数:

select count(*),
       sum(case when second_ts < datetime(first_ts, '+7day') then 1 else 0 end)
from (select user_id, min(pv_ts) as first_ts,
             nullif(max(pv_ts), min(pv_ts)) as second_ts
      from (select pv.*,
                   row_number() over (partition by user_id order by pv_ts) as seqnum
            from page_views pv
           ) pv
      where seqnum <= 2
      group by user_id
     ) u;

相关问题