所以我有一个访问日志表,它基本上跟踪多个用户对应用程序的访问。它以用户标识和时间戳作为一列。我已经能够聚合用户的访问数据,例如现在我知道有多少周、双周、月、季度和年用户使用以下逻辑访问了应用程序:
SELECT DISTINCT ,user_id
,count( distinct date_part('year', timestamp) * 100 + date_part('week', timestamp)) as week
,count( distinct date_part('year', timestamp) * 100 + ceiling(date_part('week', timestamp)::decimal/2)) as biweek
,count( distinct date_part('year', timestamp) * 100 + date_part('month', timestamp)) as month
,count( distinct date_part('year', timestamp) * 100 + ceiling(date_part('month', timestamp)::decimal/3)) as quarter
,count( distinct date_part('year', timestamp)) as year
FROM visit_logs
现在我想更进一步,将用户定义为每周、每两周、每月、每季度、每年和很少返回的用户,以便他们在所有bucket中遵循80%的阈值,以澄清在选定的时间范围内,如果用户在一周内返回应用程序的时间超过80%,那么他就是每周用户,以此类推。
下面是sql代码,我试图进一步复制上面的python逻辑:
, temp AS
(
SELECT
user_id
,SUM(week) * 0.8 as total_weeks
,SUM(biweek) * 0.8 as total_biweek
,SUM(month) * 0.8 as total_months
,SUM(quarter) * 0.8 as total_quarters
,SUM(year) * 0.8 as total_year
FROM time_count
GROUP BY 1
)
,week_count as
(
SELECT CASE WHEN week > total_weeks THEN 'Weekly'
WHEN biweek > total_biweek THEN 'Biweekly'
WHEN month > total_months THEN 'Monthly'
WHEN quarter > total_quarters THEN 'quarterly'
WHEN year > total_year THEN 'yearly'
ELSE 'rarely'
END as time_bucket
FROM time_count
LEFT JOIN temp ON temp.user_id = time_count.user_id
)
SELECT * FROM week_count
我不知道我哪里出了问题,但它似乎不像预期的那样工作。请尽早提供建议/协助
1条答案
按热度按时间6ljaweal1#
我有点糊涂了。我会使用窗口函数来处理这个问题。下面是一个内置时间框架的示例: