SQL Server Find combination of continuous ranges returning max sum

dvtswwa3  于 2023-08-02  发布在  其他
关注(0)|答案(2)|浏览(93)

I have an SQL Table resembling the following

-- Create the table
CREATE TABLE YourTableName (
    val FLOAT,
    p300 FLOAT,
    p100 FLOAT
);
-- Insert values into the table
INSERT INTO YourTableName (val, p300, p100)
VALUES
    (2295.91836734693400, -1.370, -2.340),
    (1538.77551020407994, -0.035, 0.135),
    (1269.68503937007615, -0.041, 0.300),
    (-1130.38277511960990, -0.160, -0.075),
    (1004.27350427350345, -0.070, 0.030),
    (-2396.37305699481525, -0.210, 0.580),
    (1632.46268656716000, -0.090, 0.290)

I need to find the combination of continuous ranges for columns p100 and p300 that return the max possible result in column val. This example is for 3 columns, but my real world case has more columns and more rows.

I firstly did a script finding the max sum subarray for each property seperately. This worked. Then I proceeded to try finding the max sum submatrix between p100 and p300, but I realized that this wouldn't work, as each of these has to be continuous and I can only order the matrix in one way.

dpiehjr4

dpiehjr41#

All right, I have a potential solution, but it seems too neat, so I probably missed something :P

CREATE TABLE YourTableName (
    val FLOAT,
    p300 FLOAT,
    p100 FLOAT,
    ix int IDENTITY
);
-- Insert values into the table
INSERT INTO YourTableName (val, p300, p100)
VALUES
    (2295.91836734693400, -1.370, -2.340),
    (1538.77551020407994, -0.035, 0.135),
    (1269.68503937007615, -0.041, 0.300),
    (-1130.38277511960990, -0.160, -0.075),
    (1004.27350427350345, -0.070, 0.030),
    (-2396.37305699481525, -0.210, 0.580),
    (1632.46268656716000, -0.090, 0.290)

;WITH cte AS (
    SELECT *
    , row_number() OVER(ORDER BY p300) AS sort
    FROM    YourTableName
    )
, cte2 AS (
    SELECT  val AS sum, c.sort, c.ix, CAST(ix AS NVARCHAR(MAX)) AS path
    FROM    cte  c
    UNION ALL
    SELECT  c.val + c2.sum, c.sort, c2.ix, CONCAT(path, '_', c.ix)
    FROM    cte c
    INNER JOIN cte2 c2
        ON  c.sort = c2.sort + 1
     )
SELECT TOP 1 x.ix, x.path, max(sum)
FROM (
    SELECT c.*
    ,   y.p100
    ,   CASE WHEN lag(p100) OVER(partition BY c.path ORDER BY p100) <> ISNULL(y.prev100, -9999999999) THEN 1 ELSE 0 END AS unordered
    FROM    cte2 c
    CROSS apply STRING_SPLIT(c.path, '_') x
    INNER JOIN (
        SELECT p100
        ,   lag(p100) OVER(ORDER BY p100) AS prev100
        ,   ix
        FROM    YourTableName y
        ) y
        ON  y.ix = x.Value
    ) x
GROUP BY x.ix, x.path
HAVING max(unordered) = 0
ORDER BY MAX(sum) DESC

--5445.198
--DROP TABLE YourTableName

First, I added an identity column to quickly identify a row in the matrix.

Then I created a recursive CTE which loops every row sorted by the p300 value.

Every iteration creates a "path" which is basically a way to trace how rows were combined into a sum. For example, 1_2_3_5 meaning rows in the path were 1, 2, 3, 5.

The loop will then sum rows 1 - 6, 1 - 5, 1 - 4, 1 - 3, 1 - 2, 1 - 1, 2 - 6... etc.

Finally, I unwrap the path by doing a string_split, and fetch the original row from the matrix. The matrix inside the join is fixed so it also fetches the previous p100 value. This value is compared to the previous value of the generated path table. This is to make sure we actually respect the sequence of the values and not jump around.

Finally, I group by the sum and path where there aren't any "unordered" rows. Then it's easy to fetch the maximum sum.

This code should be easy to adapt for another p-value. Just add another LAG and make sure to check for continuous values.

6ojccjat

6ojccjat2#

To find the combination of continuous ranges that returns the maximum sum, you can use the Kadane's algorithm. Kadane's algorithm is an efficient way to find the maximum subarray sum in an array of numbers. Here's how you can apply it to find the combination of continuous ranges:

  1. Define an array of numbers (let's call it nums ) that contains the continuous ranges.
  2. Initialize two variables: max_sum to keep track of the maximum sum found so far and current_sum to keep track of the sum of the current subarray.
  3. Also, keep track of the starting and ending indices of the maximum subarray.
  4. Iterate through the nums array and update current_sum by adding the current element.
  5. If current_sum becomes negative, reset it to zero because a negative sum will not help maximize the overall sum.
  6. If current_sum is greater than max_sum , update max_sum and record the starting and ending indices of the maximum subarray.
  7. Finally, extract the subarray using the recorded indices as the combination of continuous ranges that returns the maximum sum.

Here's a Python function implementing Kadane's algorithm to find the maximum subarray and the corresponding combination of continuous ranges:

def find_max_subarray(nums):
    max_sum = float('-inf')
    current_sum = 0
    start = 0
    end = 0

    for i in range(len(nums)):
        current_sum += nums[i]

相关问题