SQL Server Using string_split to create rows from multiple columns

jhkqcmku  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(100)

I have data that looks something like this example (on an unfortunately much larger scale):

+----+-------+--------------------+-----------------------------------------------+
| ID | Data  | Cost               | Comments                                      |
+----+-------+--------------------+-----------------------------------------------+
| 1  | 1|2|3 | $0.00|$3.17|$42.42 | test test||previous thing has a blank comment |
+----+-------+--------------------+-----------------------------------------------+
| 2  | 1     | $420.69            | test                                          |
+----+-------+--------------------+-----------------------------------------------+
| 3  | 1|2   | $3.50|$4.20        | |test                                         |
+----+-------+--------------------+-----------------------------------------------+

Some of the columns in the table I have are pipeline delimited, but they are consistent by each row. So each delimited value corresponds to the same index in the other columns of the same row.

So I can do something like this which is what I want for a single column:

SELECT ID, s.value AS datavalue
FROM MyTable t CROSS APPLY STRING_SPLIT(t.Data, '|') s

and that would give me this:

+----+-----------+
| ID | datavalue |
+----+-----------+
| 1  | 1         |
+----+-----------+
| 1  | 2         |
+----+-----------+
| 1  | 3         |
+----+-----------+
| 2  | 1         |
+----+-----------+
| 3  | 1         |
+----+-----------+
| 3  | 2         |
+----+-----------+

but I also want to get the other columns as well (cost and comments in this example) so that the corresponding items are all in the same row like this:

+----+-----------+-----------+------------------------------------+
| ID | datavalue | costvalue | commentvalue                       |
+----+-----------+-----------+------------------------------------+
| 1  | 1         | $0.00     | test test                          |
+----+-----------+-----------+------------------------------------+
| 1  | 2         | $3.17     |                                    |
+----+-----------+-----------+------------------------------------+
| 1  | 3         | $42.42    | previous thing has a blank comment |
+----+-----------+-----------+------------------------------------+
| 2  | 1         | $420.69   | test                               |
+----+-----------+-----------+------------------------------------+
| 3  | 1         | $3.50     |                                    |
+----+-----------+-----------+------------------------------------+
| 3  | 2         | $4.20     | test                               |
+----+-----------+-----------+------------------------------------+

I'm not sure what the best or most simple way to achieve this would be

wfauudbj

wfauudbj1#

This isn't going to be achievable with STRING_SPLIT as Microsoft refuse to supply the ordinal position as part of the result set*. As a result, you'll need to use a different function which does. Personally, I recommend DelimitedSplit8k_LEAD , originally written by Jeff Moden and then improved by Eirikur Eiriksson.

Then, you can do this:

CREATE TABLE #Sample (ID int,
                      [Data] varchar(200),
                      Cost  varchar(200),
                      Comments varchar(8000));
GO
INSERT INTO #Sample
VALUES (1,'1|2|3','$0.00|$3.17|$42.42','test test||previous thing has a blank comment'),
       (2,'1','$420.69','test'),
       (3,'1|2','$3.50|$4.20','|test');

GO
SELECT S.ID,
       DSd.Item AS DataValue,
       DSc.Item AS CostValue,
       DSct.Item AS CommentValue
FROM #Sample S
     CROSS APPLY dbo.DelimitedSplit8K_LEAD(S.[Data],'|')  DSd
     CROSS APPLY (SELECT *
                  FROM DelimitedSplit8K_LEAD(S.Cost,'|') SS
                  WHERE SS.ItemNumber = DSd.ItemNumber) DSc
     CROSS APPLY (SELECT *
                  FROM DelimitedSplit8K_LEAD(S.Comments,'|') SS
                  WHERE SS.ItemNumber = DSd.ItemNumber) DSct;

GO
DROP TABLE #Sample;
GO

There is, however, only one true answer to this question: Don't store delimited values in SQL Server. Store them in a normalised manner, and you won't have this problem.

  • An ordinal parameter was added to STRING_SPLIT in SQL Server 2022.
SELECT S.ID,
       SSd.Value AS DataValue,
       SSc.Value AS CostValue,
       SSct.Value AS CommentValue
FROM #Sample S
        CROSS APPLY STRING_SPLIT(S.[Data],'|',1)  SSd
        CROSS APPLY (SELECT *
                    FROM STRING_SPLIT(S.Cost,'|',1) SS
                    WHERE SS.ordinal = SSd.ordinal) SSc
        CROSS APPLY (SELECT *
                    FROM STRING_SPLIT(S.Comments,'|',1) SS
                    WHERE SS.ordinal = SSd.ordinal) SSct;
lb3vh1jj

lb3vh1jj2#

Here is a solution approach using a recursive CTE instead of a User Defined Funtion (UDF) which is useful for those without permission to create functions.

CREATE TABLE mytable(
   ID       INTEGER  NOT NULL PRIMARY KEY 
  ,Data     VARCHAR(7) NOT NULL
  ,Cost     VARCHAR(20) NOT NULL
  ,Comments VARCHAR(47) NOT NULL
);
INSERT INTO mytable(ID,Data,Cost,Comments) VALUES (1,'1|2|3','$0.00|$3.17|$42.42','test test||previous thing has a blank comment');
INSERT INTO mytable(ID,Data,Cost,Comments) VALUES (2,'1','$420.69','test');
INSERT INTO mytable(ID,Data,Cost,Comments) VALUES (3,'1|2','$3.50|$4.20','|test');

This query allows choice of delimiter by using a variable, then using a common table expression it parses each delimited string to produce a rows for each portion of those strings, and retains the ordinal position of each.

declare @delimiter as varchar(1)
set @delimiter = '|'

;with cte as (
      select id
           , convert(varchar(max), null) as datavalue
           , convert(varchar(max), null) as costvalue
           , convert(varchar(max), null) as commentvalue
           , convert(varchar(max), data + @delimiter) as data
           , convert(varchar(max), cost + @delimiter) as cost
           , convert(varchar(max), comments + @delimiter) as comments
      from mytable as t
      union all
      select id
           , convert(varchar(max), left(data, charindex(@delimiter, data) - 1))
           , convert(varchar(max), left(cost, charindex(@delimiter, cost) - 1))
           , convert(varchar(max), left(comments, charindex(@delimiter, comments) - 1))
           , convert(varchar(max), stuff(data, 1, charindex(@delimiter, data), ''))
           , convert(varchar(max), stuff(cost, 1, charindex(@delimiter, cost), ''))
           , convert(varchar(max), stuff(comments, 1, charindex(@delimiter, comments), ''))
      from cte
      where (data like ('%' + @delimiter + '%') and cost like ('%' + @delimiter + '%')) or comments like ('%' + @delimiter + '%')
     )
select id, datavalue, costvalue, commentvalue
from cte
where datavalue IS NOT NULL
order by id, datavalue

As the recursion adds new rows, it places the first portion of the delimited strings into the wanted output columns using left() , then also, using stuff() , removes the last used delimiter from the source strings so that the next row will start at the next delimiter. Note that to initiate the extractions, the delimiter is added to the end of the source delimited strings which is to ensure the where clause does not exclude any of the wanted strings.

the result:

id   datavalue   costvalue              commentvalue             
 ---- ----------- ----------- ------------------------------------ 
   1           1   $0.00       test test                           
   1           2   $3.17                                           
   1           3   $42.42      previous thing has a blank comment  
   2           1   $420.69     test                                
   3           1   $3.50                                           
   3           2   $4.20       test

demonstrated here at dbfiddle.uk

相关问题