I have an SQL Table resembling the following
-- Create the table
CREATE TABLE YourTableName (
val FLOAT,
p300 FLOAT,
p100 FLOAT
);
-- Insert values into the table
INSERT INTO YourTableName (val, p300, p100)
VALUES
(2295.91836734693400, -1.370, -2.340),
(1538.77551020407994, -0.035, 0.135),
(1269.68503937007615, -0.041, 0.300),
(-1130.38277511960990, -0.160, -0.075),
(1004.27350427350345, -0.070, 0.030),
(-2396.37305699481525, -0.210, 0.580),
(1632.46268656716000, -0.090, 0.290)
I need to find the combination of continuous ranges for columns p100 and p300 that return the max possible result in column val. This example is for 3 columns, but my real world case has more columns and more rows.
I firstly did a script finding the max sum subarray for each property seperately. This worked. Then I proceeded to try finding the max sum submatrix between p100 and p300, but I realized that this wouldn't work, as each of these has to be continuous and I can only order the matrix in one way.
2条答案
按热度按时间dpiehjr41#
All right, I have a potential solution, but it seems too neat, so I probably missed something :P
First, I added an identity column to quickly identify a row in the matrix.
Then I created a recursive CTE which loops every row sorted by the p300 value.
Every iteration creates a "path" which is basically a way to trace how rows were combined into a sum. For example, 1_2_3_5 meaning rows in the path were 1, 2, 3, 5.
The loop will then sum rows 1 - 6, 1 - 5, 1 - 4, 1 - 3, 1 - 2, 1 - 1, 2 - 6... etc.
Finally, I unwrap the path by doing a string_split, and fetch the original row from the matrix. The matrix inside the join is fixed so it also fetches the previous p100 value. This value is compared to the previous value of the generated path table. This is to make sure we actually respect the sequence of the values and not jump around.
Finally, I group by the sum and path where there aren't any "unordered" rows. Then it's easy to fetch the maximum sum.
This code should be easy to adapt for another p-value. Just add another LAG and make sure to check for continuous values.
6ojccjat2#
To find the combination of continuous ranges that returns the maximum sum, you can use the Kadane's algorithm. Kadane's algorithm is an efficient way to find the maximum subarray sum in an array of numbers. Here's how you can apply it to find the combination of continuous ranges:
nums
) that contains the continuous ranges.max_sum
to keep track of the maximum sum found so far andcurrent_sum
to keep track of the sum of the current subarray.nums
array and updatecurrent_sum
by adding the current element.current_sum
becomes negative, reset it to zero because a negative sum will not help maximize the overall sum.current_sum
is greater thanmax_sum
, updatemax_sum
and record the starting and ending indices of the maximum subarray.Here's a Python function implementing Kadane's algorithm to find the maximum subarray and the corresponding combination of continuous ranges: