SQL Server Avoiding first zero in SQL Lag() and Diff calculation

z4iuyo4d  于 2023-04-19  发布在  其他
关注(0)|答案(2)|浏览(78)

I have the following table showing number of new customers, and number of returning customers:

fact_date   NewCustomer ReturningCustomer
01/08/2022  10000       0
02/08/2022  0           9944
03/08/2022  0           9894
04/08/2022  0           9842
05/08/2022  0           9803
06/08/2022  0           9748
07/08/2022  0           9707
08/08/2022  0           9654

I am trying to create a difference column to calculate churn.

select *,
lag(ReturningCustomer,1) over (order by fact_date) as PrevDay,
ReturningCustomer - lag(ReturningCustomer,1) over (order by fact_date) as Diff 
from table
order by fact_date

Which gives me:

fact_date   NewCustomer ReturningCustomer   PrevDay Diff
01/08/2022  10000       0                   NULL    NULL
02/08/2022  0           9944                0       9944
03/08/2022  0           9894                9944    -50
04/08/2022  0           9842                9894    -52
05/08/2022  0           9803                9842    -39
06/08/2022  0           9748                9803    -55
07/08/2022  0           9707                9748    -41
08/08/2022  0           9654                9707    -53

However you can see the second date shows a "diff" of 9,944 . But really it should be -56 . Because the first day had 1000 customers and on the second day we lost 56 customers.

How can I make sure the Diff value is 56?

I have tried playing with case statements, default values for the lag etc. but it's a complete mess.

yvt65v4c

yvt65v4c1#

Is this possibly a better solution?

WITH cte AS (
  SELECT fact_date, 
         Newcustomer, 
         ReturningCustomer, 
         LAG(Newcustomer + ReturningCustomer) OVER (ORDER BY fact_date) AS PrevDay
  FROM tab
)

SELECT fact_date, 
       Newcustomer, 
       ReturningCustomer, 
       PrevDay,
          CASE WHEN PrevDay IS NULL THEN NULL 
          ELSE Newcustomer + ReturningCustomer - PrevDay END AS Diff
FROM cte
ORDER BY fact_date;

It would be way more easier to use just one column instead of NewCustomer and ReturningCustomer . Now, when you add e.g. 10 NewCustomer s it takes that into account.

db<>fiddle

t9aqgxwy

t9aqgxwy2#

Try this. Here diff is difference in total number of Customers (NewCustomer+ReturningCustomer). If necessary, take (ReturningCustomer) only.

select *,
   lag(NewCustomer+ReturningCustomer,1,0) 
            over (order by fact_date) as PrevDay
  ,(NewCustomer+ReturningCustomer) - lag(NewCustomer+ReturningCustomer,1,0) 
            over (order by fact_date) as Diff 
from test
order by fact_date

Query result

fact_dateNewCustomerReturningCustomerPrevDayDiff
2022-01-08100000010000
2022-02-080994410000-56
2022-03-08098949944-50
2022-04-08098429894-52
2022-05-08098039842-39
2022-06-08097489803-55
2022-07-08097079748-41
2022-08-08096549707-53

相关问题