I have the following table:
| ID | Number | Size |
| ------------ | ------------ | ------------ |
| 7 | 1 | 1.5 |
| 7 | 2 | 1.5 |
| 8 | 1 | 1.625 |
| 8 | 2 | 1.03125 |
| 8 | 3 | 1.03125 |
| 8 | 4 | 1.03125 |
| 8 | 5 | 1.625 |
| 8 | 6 | 1 |
| 8 | 7 | 1.625 |
| 8 | 8 | 1.625 |
| 8 | 9 | 1.625 |
| 9 | 1 | 1 |
| 9 | 2 | 2 |
| 9 | 3 | 3 |
| 9 | 4 | 4 |
| 9 | 5 | 1 |
I would like to create a column with a single string value for each ID that groups the size, but only when the values are consecutive based on the number, and includes the number of matching sizes.
For example, for ID=8, I would like something like this:
1.625 x 1.03125 (x3) x 1.625 x 1 x 1.625 (x3)
Please note that the number could increment to any integer value and the size could be any positive number, up to 7 decimal places.
As I've been thinking through it, these are the steps I've considered so far:
Group the sizes when there are equal consecutive values. I've considered using the LAG()/LEAD() functions to accomplish this or the CASE expression. Perhaps there is some GROUP BY/subquery magic that could be performed to accomplish this as well? Regardless, this step would return something like this, where the "Group" column is anything that could be used to group the values within the same ID:
| ID | Number | Size | Group |
| ------------ | ------------ | ------------ | ------------ |
| 7 | 1 | 1.5 | 1 |
| 7 | 2 | 1.5 | 1 |
| 8 | 1 | 1.625 | 1 |
| 8 | 2 | 1.03125 | 2 |
| 8 | 3 | 1.03125 | 2 |
| 8 | 4 | 1.03125 | 2 |
| 8 | 5 | 1.625 | 3 |
| 8 | 6 | 1 | 4 |
| 8 | 7 | 1.625 | 5 |
| 8 | 8 | 1.625 | 5 |
| 8 | 9 | 1.625 | 5 |
| 9 | 1 | 1 | 1 |
| 9 | 2 | 2 | 2 |
| 9 | 3 | 3 | 3 |
| 9 | 4 | 4 | 4 |
| 9 | 5 | 1 | 5 |Then, once I've created a proper Group column, I was thinking I'd next group the sizes and create an additional column for the quantity in that group. This would look something like this:
| ID | Number | Size | Group | Quantity |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 7 | 1 | 1.5 | 1 | 2 |
| 8 | 1 | 1.625 | 1 | 1 |
| 8 | 2 | 1.03125 | 2 | 3 |
| 8 | 5 | 1.625 | 3 | 1 |
| 8 | 6 | 1 | 4 | 1 |
| 8 | 7 | 1.625 | 5 | 3 |
| 9 | 1 | 1 | 1 | 1 |
| 9 | 2 | 2 | 2 | 1 |
| 9 | 3 | 3 | 3 | 1 |
| 9 | 4 | 4 | 4 | 1 |
| 9 | 5 | 1 | 5 | 1 |
Finally, potentially using a combination of the STRING_AGG() and CONCAT() functions, I would put the data into the following format:
ID | Size |
---|---|
7 | 1.5 (x2) |
8 | 1.625 (x1) x 1.03125 (x3) x 1.625 (x1) x 1 (x1) x 1.625 (x3) |
9 | 1 (x1) x 2 (x1) x 3 (x1) x 4 (x1) x 1 (x1) |
And, preferably, I would omit the " (x1)", such that the final form looks like this:
ID | Size |
---|---|
7 | 1.5 (x2) |
8 | 1.625 x 1.03125 (x3) x 1.625 x 1 x 1.625 (x3) |
9 | 1 x 2 x 3 x 4 x 1 |
It's kind of a lot. But in the end, I'm hoping to turn the data from these tables into a more readable format for users, but 1) I wasn't exactly sure how to do this, and 2) Given potentially multiple routes for accomplishing this, I wasn't sure what the most performant option was.
Any thoughts?
P.S. I am open to any methods, it certainly doesn't have to follow the train of thought I included here.
*Edited to add a new dataset (ID=9)
1条答案
按热度按时间6ie5vjzr1#
You have a gaps and islands problem, one option is to use a difference between two
row_number
s to define the required groups, try the following using SQL Server syntax:See demo