如何根据用户访问的频率将返回的用户分类为不同的时间段

2hh7jdfx  于 2021-07-24  发布在  Java
关注(0)|答案(1)|浏览(249)

所以我有一个访问日志表,它基本上跟踪多个用户对应用程序的访问。它以用户标识和时间戳作为一列。我已经能够聚合用户的访问数据,例如现在我知道有多少周、双周、月、季度和年用户使用以下逻辑访问了应用程序:

SELECT DISTINCT ,user_id
                ,count( distinct date_part('year', timestamp) * 100 + date_part('week', timestamp)) as week
                ,count( distinct date_part('year', timestamp) * 100 + ceiling(date_part('week', timestamp)::decimal/2)) as biweek
                ,count( distinct date_part('year', timestamp) * 100 + date_part('month', timestamp)) as month
                ,count( distinct date_part('year', timestamp) * 100 + ceiling(date_part('month', timestamp)::decimal/3)) as quarter
                ,count( distinct date_part('year', timestamp)) as year
     FROM visit_logs

现在我想更进一步,将用户定义为每周、每两周、每月、每季度、每年和很少返回的用户,以便他们在所有bucket中遵循80%的阈值,以澄清在选定的时间范围内,如果用户在一周内返回应用程序的时间超过80%,那么他就是每周用户,以此类推。
下面是sql代码,我试图进一步复制上面的python逻辑:

, temp AS 
(
      SELECT 
              user_id
              ,SUM(week) * 0.8 as total_weeks
              ,SUM(biweek) * 0.8 as total_biweek
              ,SUM(month) * 0.8 as total_months
              ,SUM(quarter) * 0.8 as total_quarters
              ,SUM(year) * 0.8 as total_year
      FROM  time_count
      GROUP BY 1
)

,week_count as 
(
            SELECT CASE WHEN week > total_weeks THEN 'Weekly'
                        WHEN biweek > total_biweek  THEN 'Biweekly'
                        WHEN month > total_months THEN 'Monthly'
                        WHEN quarter > total_quarters THEN 'quarterly'
                        WHEN year > total_year THEN 'yearly'
                        ELSE 'rarely' 
                    END as time_bucket
            FROM time_count
            LEFT JOIN temp ON temp.user_id = time_count.user_id

)

SELECT * FROM week_count

我不知道我哪里出了问题,但它似乎不像预期的那样工作。请尽早提供建议/协助

6ljaweal

6ljaweal1#

我有点糊涂了。我会使用窗口函数来处理这个问题。下面是一个内置时间框架的示例:

select user_id,
       (case when num_weeks > 0.8 * total_weeks then 'weekly'
             when num_months > 0.8 * total_months then 'monthly'
             when num_years > 0.8 * total_years then 'yearly'
             else 'rarely'
        end) as frequency
from (select user_id,
             count(distinct date_trunc('week', timestamp)) as num_weeks,
             count(distinct date_trunc('month', timestamp)) as num_months,
             count(distinct date_trunc('year', timestamp)) as num_years,
             extract(day from current_date - min(timestamp)) / 7 as total_weeks,
             extract(year from age(min(timestamp, current_date))) * 12 + extract(month from age(min(timestamp, current_date))) as total_months,
             extract(year from age(min(timestamp, current_date))) as total_years
      from visit_logs vl
      group by user_id
     ) vl;

相关问题