我正在运行一个查询,该查询返回按日期和国家分组的不同产品的活跃用户数。我还添加了两列'previous_day_active_users'和'returning_users',它们返回前一天的活跃用户数以及前一天和当天的活跃用户数。我想这样做是为了计算留存率(returning_users / previous_day_active_users)。问题是数字不匹配。此外,'previous_day_active_users'和'returning_users'总是匹配,这没有意义。我不明白我做错了什么。
查询:
WITH active_users AS (
SELECT
date_trunc('day'::text, customer_kpis_hourly.time_eet) as time_eet,
customer_kpis_hourly.market_code,
customer_kpis_hourly.user_id,
customer_kpis_hourly.active,
LAG(customer_kpis_hourly.active) OVER (PARTITION BY user_id ORDER BY time_eet) AS prev_active,
customer_kpis_hourly.active_casino,
customer_kpis_hourly.active_virtual_sports,
customer_kpis_hourly.active_poker,
customer_kpis_hourly.active_sportsbook
FROM delivery.customer_kpis_hourly
WHERE customer_kpis_hourly.time_eet >= '2021-01-01'
AND customer_kpis_hourly.time_eet < '2021-01-05'
AND customer_kpis_hourly.active
),
previous_day_active_users AS (
-- Calculate previous day's active users
SELECT
user_id,
time_eet - INTERVAL '1 day' AS previous_day
FROM active_users
WHERE prev_active
)
SELECT
'D'::text AS time_gran,
date_trunc('day'::text, active_users.time_eet) AS time_eet,
active_users.market_code,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active) AS active_user,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_casino) AS active_casino,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_virtual_sports) AS active_vs,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_poker) AS active_poker,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_sportsbook) AS active_sb,
count(DISTINCT previous_day_active_users.user_id) as previous_day_active_user,
count(DISTINCT CASE WHEN previous_day_active_users.user_id IS NOT NULL THEN active_users.user_id END) AS returning_users
FROM
active_users
LEFT JOIN
previous_day_active_users
ON active_users.user_id = previous_day_active_users.user_id
-- AND date_trunc('day'::text, active_users.time_eet) = previous_day_active_users.previous_day
GROUP BY
'D'::text, date_trunc('day'::text, active_users.time_eet), active_users.market_code
字符串
1条答案
按热度按时间kqlmhetl1#
当你计算
previous_day_active_users
时,你基本上是为给定的user_id
计算time_eet - INTERVAL '1 day' AS previous_day
,其中字符串
评估为true。但这似乎并没有真正检查用户在前一天是否真的在工作。相反,您需要应用如下内容:
型
这将创建成对的
curr
和prev
,您可以轻松地将其转换为聚合查询,将curr.user_id
的总数作为活动用户的数量,并将非空的prev.user_id
的总数作为前一天的活动用户。然后,您可以计算它们的差异或比率。