postgresql 计算返回的活动用户SQL

hpxqektj  于 2023-08-04  发布在  PostgreSQL
关注(0)|答案(1)|浏览(120)

我正在运行一个查询,该查询返回按日期和国家分组的不同产品的活跃用户数。我还添加了两列'previous_day_active_users'和'returning_users',它们返回前一天的活跃用户数以及前一天和当天的活跃用户数。我想这样做是为了计算留存率(returning_users / previous_day_active_users)。问题是数字不匹配。此外,'previous_day_active_users'和'returning_users'总是匹配,这没有意义。我不明白我做错了什么。
查询:

WITH active_users AS (
    SELECT
        date_trunc('day'::text, customer_kpis_hourly.time_eet) as time_eet,
        customer_kpis_hourly.market_code,
        customer_kpis_hourly.user_id,
        customer_kpis_hourly.active,
        LAG(customer_kpis_hourly.active) OVER (PARTITION BY user_id ORDER BY time_eet) AS prev_active,
        customer_kpis_hourly.active_casino,
        customer_kpis_hourly.active_virtual_sports,
        customer_kpis_hourly.active_poker,
        customer_kpis_hourly.active_sportsbook
    FROM delivery.customer_kpis_hourly
    WHERE customer_kpis_hourly.time_eet >= '2021-01-01'
    AND customer_kpis_hourly.time_eet < '2021-01-05'
    AND customer_kpis_hourly.active
),
previous_day_active_users AS (
    -- Calculate previous day's active users
    SELECT
        user_id,
        time_eet - INTERVAL '1 day' AS previous_day
    FROM active_users
    WHERE prev_active
)
SELECT
    'D'::text AS time_gran,
    date_trunc('day'::text, active_users.time_eet) AS time_eet,
    active_users.market_code,
    count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active) AS active_user,
    count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_casino) AS active_casino,
    count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_virtual_sports) AS active_vs,
    count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_poker) AS active_poker,
    count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_sportsbook) AS active_sb,
    count(DISTINCT previous_day_active_users.user_id) as previous_day_active_user,
    count(DISTINCT CASE WHEN previous_day_active_users.user_id IS NOT NULL THEN active_users.user_id END) AS returning_users
FROM
    active_users
LEFT JOIN
    previous_day_active_users
    ON active_users.user_id = previous_day_active_users.user_id
--     AND date_trunc('day'::text, active_users.time_eet) = previous_day_active_users.previous_day
GROUP BY
    'D'::text, date_trunc('day'::text, active_users.time_eet), active_users.market_code

字符串

kqlmhetl

kqlmhetl1#

当你计算previous_day_active_users时,你基本上是为给定的user_id计算time_eet - INTERVAL '1 day' AS previous_day,其中

LAG(customer_kpis_hourly.active) OVER (PARTITION BY user_id ORDER BY time_eet) AS prev_active

字符串
评估为true。但这似乎并没有真正检查用户在前一天是否真的在工作。相反,您需要应用如下内容:

SELECT <your fields>
FROM delivery.customer_kpis_hourly curr
LEFT JOIN delivery.customer_kpis_hourly prev
ON curr.user_id = prev.user_id AND
   prev.time_eet >= '2021-01-03' AND
   prev.time_eet < '2021-01-04'
WHERE
   curr.time_eet >= '2021-01-04' AND
   curr.time_eet < '2021-01-05'


这将创建成对的currprev,您可以轻松地将其转换为聚合查询,将curr.user_id的总数作为活动用户的数量,并将非空的prev.user_id的总数作为前一天的活动用户。然后,您可以计算它们的差异或比率。

相关问题