So I have the following data:
| User ID | CreatedOn |
| ------------ | ------------ |
| A | 01.01.2023 |
| A | 02.01.2023 |
| A | 03.01.2023 |
| A | 05.01.2023 |
| A | 06.01.2023 |
| A | 07.01.2023 |
| A | 08.01.2023 |
| B | 01.01.2023 |
| B | 02.01.2023 |
| B | 03.01.2023 |
| B | 04.01.2023 |
| B | 05.01.2023 |
| B | 06.01.2023 |
| B | 07.01.2023 |
I need to calculate the number of users who signed in on a 7 days consecutive period. For example, User A will not be counted since we had an interruption on the 4th but user B will be counted once. It should be any 7 day period. Can someone help me with the code? I am using SQL Server. Thanks
I tried row numbers and lag but did not know where to take it from there.
with x as (
select distinct
UserId,
CreatedOn,
LAG(CreatedOn) OVER (PARTITION BY UserId ORDER BY CreatedOn) AS PrevDate,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY CreatedOn) AS RowNumber
from table)
select
*,
DATEDIFF(day, PrevDate, CreatedOn) date_diff,
case when DATEDIFF(day, PrevDate, CreatedOn) = 1 then RowNumber else 1 end as new_row
from table
2条答案
按热度按时间lzfw57am1#
To start, join the table to itself, where the join conditions are for the same user and rows with dates in the range for the date of the current row, back to seven days in the past:
Now we want to further reduce this to only show distinct days from the second table for each source user/date (assuming it's either actually a datetime column or you have the possibility of duplicates):
Then we can group this by user, and look for user groups having 7 records. Because we've already done a
DISTINCT
, we'll also need to put the prior steps in a subquery or CTE. Finally, we need to add anotherDISTINCT
, so a user is not repeated for each additional qualifying group (or for each additional day in spans going longer than seven days):See it work here:
https://dbfiddle.uk/fDnoLpLK
If you check that link, note how I formatted the date literals. Different cultures, languages, and dialects have their own expectations around how dates are formatted, and the SQL language is no different. Regardless of how your own culture expects to use dates, when communicating with a database in it's own SQL dialect, you should use it's own format. For SQL Server that format is the unseparated
yyyyMMdd
. Usingdd.MM.yyyy
is wrong at this level, even if its what you and your users will eventually expect to see.tf7tbtn22#
Here is a way to do it using window functions like
LAG()
andSUM()
:You have a gaps and islands problem, you could use the difference between
row_number
and the running total of the differences of dates to give a unique id to each successive group of rows.Demo here