SQL Server String_Split multiple columns

zu0ti5jz  于 2023-05-28  发布在  其他
关注(0)|答案(6)|浏览(188)

I need to split values of two columns in a table with separated values.

My source table looks like:
| product_id | ean_upc | is_approved |
| ------------ | ------------ | ------------ |
| 2102961 | 8710103368021;8710103368038;8710103368014 | 1;0;1 |
| 92923683 | 0195161923722;195161923722;10195161923729 | 1;1;0 |
| 656364 | 8710103203919;8710103239093;8710103203902 | 1;1;1 |
| 735699 | 8710103285885;8710103295938;8710103295952 | 0;0;1 |

What I need is a select statement that give me back each product_id and EAN code

SELECT 
    product_id, EAN.value
FROM
    IceCatUK.dbo.ice_indexCSV_tmp
CROSS APPLY
    STRING_SPLIT(ean_upc , ';') EAN

The problem is that I need to have the is_approved information too.

My final table needs to have Product_id , the separated EAN_upc and the is_approved flag.

The is_approved must be linked to the columns of the EAN

The result for the first line should look like this:
| product_id | ean_upc | is_approved |
| ------------ | ------------ | ------------ |
| 2102961 | 8710103368021 | 1 |
| 2102961 | 8710103368038 | 0 |
| 2102961 | 8710103368014 | 1 |

How to link the position in the string_split(EAN) with the position of string_split("is_approved") ?

jljoyd4f

jljoyd4f1#

Similar to @lemon's excellent answer, but gathering both EAN and APPROVED into separate result sets and joining on both product_id and ordinal to avoid the potentially large intermediate cartesian result set that is produced by two joins of Cross-Apply. Depending on the size of your data, this may be more performant.

SELECT ice_indexCSV_tmp.product_id, 
  EAN.value as ean_upc,
  dt.is_approved
FROM ice_indexCSV_tmp
   CROSS APPLY STRING_SPLIT(ean_upc, ';', 1) EAN
   INNER JOIN 
       ( 
            SELECT ice.product_id, APPROVED.value as is_approved, APPROVED.ordinal 
            FROM ice_indexCSV_tmp ice 
                CROSS APPLY STRING_SPLIT(is_approved, ';', 1) APPROVED 
      ) dt
     ON EAN.ordinal = dt.ordinal
        AND ice_indexCSV_tmp.product_id = dt.product_id

dbfiddle version

Note that the enable_ordinal parameter is only available in sql-server 2022 16.x or Azure databases. A terrible oversight that it wasn't included when string_split() was initially rolled out.

The row_number route in @charleschen's answer is likely the next best option, but because there is no guarantee about the order of the values spit out by the cross-apply and string_split() it's a bit of a gamble. That being said, I've never seen SQL Server spit out string_split results in a different order than the string that was fed into it.

ULTIMATELY, if I had an older version of sql-server where the enable_ordinal was not available, and this was production level code, I would probably look to pre-processing the source data that was used to populate this table before ingesting into sql server. You could make quick work of this in a pandas dataframe with pandas.Series.explode() .

rqqzpn5f

rqqzpn5f2#

Beginning from SQL Server 2022, you can use STRING_SPLIT with the third parameter set to 1 to catch ordinality of the strings being split into rows. Then you can filter on matching ordinalities.

SELECT tab.product_id,
       single_ean_upc.value AS ean_upc,
       single_is_approved.value AS is_approved
FROM tab
CROSS APPLY STRING_SPLIT(tab.ean_upc, ';', 1) AS single_ean_upc
CROSS APPLY STRING_SPLIT(tab.is_approved, ';', 1) AS single_is_approved
WHERE single_ean_upc.ordinal =  single_is_approved.ordinal

Output:

product_idean_upcis_approved
210296187101033680211
210296187101033680380
210296187101033680141
9292368301951619237221
929236831951619237221
92923683101951619237290
65636487101032039191
65636487101032390931
65636487101032039021
73569987101032858850
73569987101032959380
73569987101032959521

Check the demo here .

68bkxrlz

68bkxrlz3#

You need an ordinal position to join on in order to guarantee the correct values are joined together. In SQL Server 2022+ String_Split supports an ordinal position however prior to that, you have to roll your own; you simply cannot rely on artificially adding a row number against a set which by definition is unordered.

If you don't have a suitable function I include one below.

Using a defined sequence you can then do:

with ean as (
    select Product_Id, Part Ean_UPC, Seq
    from t
    cross apply dbo.stringsplit(Ean_Upc)
), a as (
    select Product_Id, Part Is_Approved, Seq
    from t
    cross apply dbo.stringsplit(Is_Approved)
)
select ean.Product_Id, ean.Ean_UPC, a.Is_Approved
from ean join a on ean.Product_Id = a.Product_Id and ean.Seq = a.Seq;

String split function:

create or alter function dbo.StringSplit(@String Varchar(100))
returns Table with schemabinding as
return
with 
    t1(n) as (select n from(values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n(n)), 
    t2(n) as (select 1 from t1 x cross join t1 y), /* 100 rows for 100 chars */
    t(n) as (select top (IsNull(DataLength(@string), 0)) Row_Number() over (order by (select 1/0)) from t2),
    s(n1) as (
        select 1 
        union all
        select t.n + 1 
        from t
        where Substring(@string, t.n, 1) =';' /* Return start positions for each part in string */
    ),
    pos as (select n1, IsNull(Lead(n1) over(order by n1), 100) nxt from s)
select part, 
    Row_Number() over(order by n1) Seq
from pos
cross apply(values(Substring(@string, n1, nxt - n1 - 1)))w(part)
where DataLength(part) > 0;

Demo DBFiddle

htrmnn0y

htrmnn0y4#

Just another option if not 2022. This will be supported by 2016+

Note: Provided your data consistency, the string_escape( ..., 'json') could be trimmed out.

Example

Select A.product_id
      ,B.*
 From  YourTable A
 Cross Apply (
              Select [ean_upc]     = B1.Value
                    ,[is_approved] = B2.Value
               From  OpenJSON( '["'+replace(string_escape([ean_upc],'json'),';','","')+'"]' ) B1
               Join  OpenJSON( '["'+replace(string_escape([is_approved],'json'),';','","')+'"]' ) B2
               On B1.[Key]=B2.[Key]
             ) B

Results

qnyhuwrf

qnyhuwrf5#

This is an other way using string_split and inner join :

select ean.product_id, ean.ean_upc, i.is_approved
from (
  select e.product_id, value as ean_upc, ordinal
  from mytable e
  Cross Apply string_split(ean_upc, ';', 1) b
  where e.product_id = 2102961
) as ean
INNER JOIN (
  select product_id, value as is_approved, ordinal
  from mytable
  Cross Apply string_split(is_approved,';', 1) b
  where product_id = 2102961
) as i on i.product_id = ean.product_id and i.ordinal = ean.ordinal

Result :

product_id  ean_upc         is_approved
2102961     8710103368021   1
2102961     8710103368038   0
2102961     8710103368014   1

Starting from SQL Server 2022, the STRING_SPLIT() function supports an optional third parameter to enable_ordinal.

Demo here

rryofs0p

rryofs0p6#

To link the position in the string_split function for EAN with the position of string_split for is_approved , you can make use of the ROW_NUMBER function to assign a row number to each element in both separate values. Then you can join the results based on the row number, like this:

WITH EAN_CTE AS (
    SELECT product_id, value AS ean, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY (SELECT NULL)) AS rn
    FROM IceCatUK.dbo.ice_indexCSV_tmp
    CROSS APPLY string_split(ean_upc, ';')
), Approved_CTE AS (
    SELECT product_id, value AS is_approved, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY (SELECT NULL)) AS rn
    FROM IceCatUK.dbo.ice_indexCSV_tmp
    CROSS APPLY string_split(is_approved, ';')
)
SELECT EAN_CTE.product_id, EAN_CTE.ean, Approved_CTE.is_approved
FROM EAN_CTE
JOIN Approved_CTE ON EAN_CTE.product_id = Approved_CTE.product_id AND EAN_CTE.rn = Approved_CTE.rn

This query uses Common Table Expressions (CTEs) to assign row numbers to the separated values of ean_upc and is_approved using the ROW_NUMBER function. Then it joins the two CTEs based on the product_id and the row number ( rn ) to link the corresponding elements together.

This way, the result will include the product_id , the separated ean_upc , and the corresponding is_approved flag for each entry.

相关问题