SQL Server How do I group by values, but only when they're consecutive?

pkln4tw6  于 2023-04-10  发布在  其他
关注(0)|答案(1)|浏览(129)

I have the following table:
| ID | Number | Size |
| ------------ | ------------ | ------------ |
| 7 | 1 | 1.5 |
| 7 | 2 | 1.5 |
| 8 | 1 | 1.625 |
| 8 | 2 | 1.03125 |
| 8 | 3 | 1.03125 |
| 8 | 4 | 1.03125 |
| 8 | 5 | 1.625 |
| 8 | 6 | 1 |
| 8 | 7 | 1.625 |
| 8 | 8 | 1.625 |
| 8 | 9 | 1.625 |
| 9 | 1 | 1 |
| 9 | 2 | 2 |
| 9 | 3 | 3 |
| 9 | 4 | 4 |
| 9 | 5 | 1 |

I would like to create a column with a single string value for each ID that groups the size, but only when the values are consecutive based on the number, and includes the number of matching sizes.

For example, for ID=8, I would like something like this:

1.625 x 1.03125 (x3) x 1.625 x 1 x 1.625 (x3)

Please note that the number could increment to any integer value and the size could be any positive number, up to 7 decimal places.

As I've been thinking through it, these are the steps I've considered so far:

  1. Group the sizes when there are equal consecutive values. I've considered using the LAG()/LEAD() functions to accomplish this or the CASE expression. Perhaps there is some GROUP BY/subquery magic that could be performed to accomplish this as well? Regardless, this step would return something like this, where the "Group" column is anything that could be used to group the values within the same ID:
    | ID | Number | Size | Group |
    | ------------ | ------------ | ------------ | ------------ |
    | 7 | 1 | 1.5 | 1 |
    | 7 | 2 | 1.5 | 1 |
    | 8 | 1 | 1.625 | 1 |
    | 8 | 2 | 1.03125 | 2 |
    | 8 | 3 | 1.03125 | 2 |
    | 8 | 4 | 1.03125 | 2 |
    | 8 | 5 | 1.625 | 3 |
    | 8 | 6 | 1 | 4 |
    | 8 | 7 | 1.625 | 5 |
    | 8 | 8 | 1.625 | 5 |
    | 8 | 9 | 1.625 | 5 |
    | 9 | 1 | 1 | 1 |
    | 9 | 2 | 2 | 2 |
    | 9 | 3 | 3 | 3 |
    | 9 | 4 | 4 | 4 |
    | 9 | 5 | 1 | 5 |

  2. Then, once I've created a proper Group column, I was thinking I'd next group the sizes and create an additional column for the quantity in that group. This would look something like this:
    | ID | Number | Size | Group | Quantity |
    | ------------ | ------------ | ------------ | ------------ | ------------ |
    | 7 | 1 | 1.5 | 1 | 2 |
    | 8 | 1 | 1.625 | 1 | 1 |
    | 8 | 2 | 1.03125 | 2 | 3 |
    | 8 | 5 | 1.625 | 3 | 1 |
    | 8 | 6 | 1 | 4 | 1 |
    | 8 | 7 | 1.625 | 5 | 3 |
    | 9 | 1 | 1 | 1 | 1 |
    | 9 | 2 | 2 | 2 | 1 |
    | 9 | 3 | 3 | 3 | 1 |
    | 9 | 4 | 4 | 4 | 1 |
    | 9 | 5 | 1 | 5 | 1 |

Finally, potentially using a combination of the STRING_AGG() and CONCAT() functions, I would put the data into the following format:

IDSize
71.5 (x2)
81.625 (x1) x 1.03125 (x3) x 1.625 (x1) x 1 (x1) x 1.625 (x3)
91 (x1) x 2 (x1) x 3 (x1) x 4 (x1) x 1 (x1)

And, preferably, I would omit the " (x1)", such that the final form looks like this:

IDSize
71.5 (x2)
81.625 x 1.03125 (x3) x 1.625 x 1 x 1.625 (x3)
91 x 2 x 3 x 4 x 1

It's kind of a lot. But in the end, I'm hoping to turn the data from these tables into a more readable format for users, but 1) I wasn't exactly sure how to do this, and 2) Given potentially multiple routes for accomplishing this, I wasn't sure what the most performant option was.

Any thoughts?

P.S. I am open to any methods, it certainly doesn't have to follow the train of thought I included here.

*Edited to add a new dataset (ID=9)

6ie5vjzr

6ie5vjzr1#

You have a gaps and islands problem, one option is to use a difference between two row_number s to define the required groups, try the following using SQL Server syntax:

with cte1 as -- Step 1: using the difference between two row_numbers approach, create groups for consecutive similar values of Size.
(
  select *,
   row_number() over (partition by id order by number) - 
   row_number() over (partition by id, size order by number) grp
  from table_name 
),
cte2 as -- Step 2: get the counts for each group defined in the previous step
(
  select id, min(number) number, size, grp, count(*) cnt
  from cte1
  group by id, size, grp
)
  -- Step 3: use string_agg and concat functions to get the desired format
select id, 
  string_agg
    (-- use a case expression to not include (1x) when count = 1
      case when cnt > 1 then concat(size, ' (x', cnt, ')') else cast(size as varchar(20)) end, ' x '
    ) within group (order by number) Size
from cte2
group by id
order by id

See demo

相关问题