带超前/滞后的聚合

7d7tgy0s  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(377)

我有一个名为activities的表,它有一个memberid和一个时间戳。我想知道在给定的一个月里有多少成员执行了一项活动(即在活动表中有记录),而在过去的12个月里没有活动。我相信超前/滞后在这里是有帮助的,但我很难用脑子去想它。
(我在这里同时标记了apachehadoop和mssqlserver,因为我可以在其中任何一种中完成这项工作,并且我认为我可以很容易地将一种解决方案转换为另一种解决方案)。
感谢您的帮助!
谢谢您!

oknwwptz

oknwwptz1#

你可以使用 lag() :

select year(ts), month(ts),
       (count(distinct memberid) - 
        count(distinct case when prev_ts > dateadd(year, -1, ts) then memberid)
       ) as 
from (select memberid, 
             lag(ts) over (partition by memberid order by ts) as prev_ts
      from activities a
     ) a
group by year(ts), month(ts);
ojsjcaue

ojsjcaue2#

在使用lag函数时,我们需要先为每个成员和每个月创建一个记录,然后使用lag函数获取以前的活动月,最后使用where子句只获取我们想要的:

DECLARE 
  @year int = 2018, 
  @month int = 7;

WITH
  monthwise (MemberID, FirstOfMonth) AS (
    SELECT DISTINCT  MemberID, DATEADD(month, DATEDIFF(month, 0, ActivityDate), 0)
    FROM Activities
  ),
  prevActivity (MemberID, FirstOfMonth, prevFirstOfMonth) AS (
    SELECT MemberID, FirstOfMonth
    , LAG(FirstOfMonth) OVER (PARTITION BY MemberID ORDER BY FirstOfMonth)
    FROM monthwise
  )
SELECT MemberID
FROM prevActivity
WHERE MONTH(FirstOfMonth) = @month
  AND YEAR(FirstOfMonth) = @year
  AND (prevFirstOfMonth IS NULL OR DATEDIFF(month, prevFirstOfMonth, FirstOfMonth) > 12)

您也可以不使用lag函数来执行此操作:使用两个查询,一个用于本月有活动的成员,另一个用于前12个月有活动的成员。然后使用内部联接和左联接查找本月有活动但前几个月没有活动的成员:

DECLARE 
  @year int = 2018, 
  @month int = 7;

WITH
  this (MemberID) AS (
    SELECT DISTINCT MemberID
    FROM Activities 
    WHERE YEAR(ActivityDate) = @year
      AND MONTH(ActivityDate) = @month
  ),
  prev (MemberID) AS (
    SELECT DISTINCT MemberID
    FROM Activities
    WHERE ActivityDate < DATEADD(month, @month-1 +12*(@year-1900), 0)
      AND ActivityDate >= DATEADD(month, @month-1 +12*(@year-1901), 0)
  )
SELECT m.MemberID
FROM Members m
  INNER JOIN this ON m.MemberID = this.MemberID
  LEFT JOIN prev ON m.MemberID = prev.MemberID
WHERE prev.MemberID IS NULL

相关问题