SQL Server Getting earliest transaction records where a transaction code 12 occurs after a code 3, for each customer in my list

6ojccjat  于 2023-06-21  发布在  其他
关注(0)|答案(1)|浏览(133)

I have a customer transaction table with columns customer_number, transaction_date, transaction_code, transaction_amount and a few other columns.

For each customer_number in my predefined list (let's say 1,2,3,4), I want to get the earliest transactions where a code 12 transaction_code occurs after a code 3 transaction_code. So for example, if customer 1 had a transaction code 12 which occurred anytime after a transaction code 3, I would want to get both the transaction rows for customer 1 (the earliest transaction 12, and the earliest transaction 3 which occurs before it). Same for any other customers in my list that have a transaction code 12 occurring after a transaction code 3.

Here is what I have so far. I know why it doesn't work but I can't figure out how to make it get what I want. The below code checks if the transaction code 12 date is greater than the minimum of all transaction code 3 dates, not just for each particular customer_number.

SELECT t1.* from trans_table t1
WHERE t1.transaction_date > 
(select min(t2.transaction_date) 
from trans_table t2
WHERE t2.customer_number= t1.customer_number
AND t2.transaction_code = 3 
)
AND t1.transaction_code = 12
AND t1.customer_number in (1,2,3,4)
kr98yfug

kr98yfug1#

Here's a quick and dirty solution that might work:

DECLARE @data TABLE ( customer_number int, transaction_date date, transaction_code int, transaction_amount numeric(9,2))

INSERT INTO @data
VALUES  (1, '20221231', 1, 10)
,   (1, '20230101', 1, 10)
,   (1, '20230102', 3, 100)
,   (1, '20230104', 4, 100)
,   (1, '20230105', 12, 100)
,   (1, '20230202', 3, 100)

,   (2, '20230101', 12, 10)
,   (2, '20230102', 3, 100)
,   (2, '20230105', 12, 100)
,   (2, '20230202', 3, 100)

,   (3, '20230101', 12, 10)
,   (3, '20230102', 12, 10)
,   (3, '20230103', 3, 100)

,   (4, '20230101', 3, 10)
,   (4, '20230102', 3, 10)
,   (4, '20230103', 12, 100)

,   (5, '20230101', 3, 10)
,   (5, '20230102', 12, 10)
,   (5, '20230103', 3, 100)
,   (5, '20230104', 12, 100)

SELECT  TOP 1 WITH ties * -- 4
FROM    (
    SELECT  *
    ,   COUNT(*) OVER(partition BY customer_number, totalFlag) AS numberOfRecords -- 3
    FROM    (
            SELECT  sum(flag) OVER(partition BY customer_number ORDER BY transaction_date) AS totalFlag -- 2
            ,   *
            FROM    (
                SELECT  CASE WHEN lag(transaction_code) OVER(partition BY customer_number ORDER BY transaction_code) = 3 AND transaction_code = 12 THEN 0 ELSE 1 END flag -- 1
                ,   *
                FROM    @data
                WHERE   transaction_code IN (3,12)
            ) x
        ) x
    ) x
WHERE   x.numberOfRecords > 1
ORDER BY dense_rank() OVER(partition BY customer_number ORDER BY totalFlag) -- 5

The idea is to use a gaps and island style of query to group the transactions together by "flag". This flag is increased every time a number combination of transactions occurs.

The details here:

  1. This creates a flag which gets value 0 for desirable combo, and 1 for undesirable one. This allows us to increase the flag when we break a correct combo. So first row (3) will get flag = 1, and second row that matches (12) will keep flag = 1 meaning we get a desirable group.
  2. This summarize the above flag creating a running sum and creates a grouping column
  3. This counts the number of rows in our combination. This is needed because otherwise we might get only "3" as transaction combination, here we want to have 2 rows ("3" and "12")
  4. and 5. With ties is a little trick which returns all first rows, regardless of how many customers we get. The DENSE_RANK creates another counter which calculates the first "correct" transaction combination

It's a bit involved query, but avoids some other complicated things, so maybe it will work out for you

相关问题