I need to split values of two columns in a table with separated values.
My source table looks like:
| product_id | ean_upc | is_approved |
| ------------ | ------------ | ------------ |
| 2102961 | 8710103368021;8710103368038;8710103368014 | 1;0;1 |
| 92923683 | 0195161923722;195161923722;10195161923729 | 1;1;0 |
| 656364 | 8710103203919;8710103239093;8710103203902 | 1;1;1 |
| 735699 | 8710103285885;8710103295938;8710103295952 | 0;0;1 |
What I need is a select statement that give me back each product_id
and EAN
code
SELECT
product_id, EAN.value
FROM
IceCatUK.dbo.ice_indexCSV_tmp
CROSS APPLY
STRING_SPLIT(ean_upc , ';') EAN
The problem is that I need to have the is_approved
information too.
My final table needs to have Product_id
, the separated EAN_upc
and the is_approved
flag.
The is_approved
must be linked to the columns of the EAN
The result for the first line should look like this:
| product_id | ean_upc | is_approved |
| ------------ | ------------ | ------------ |
| 2102961 | 8710103368021 | 1 |
| 2102961 | 8710103368038 | 0 |
| 2102961 | 8710103368014 | 1 |
How to link the position in the string_split(EAN)
with the position of string_split("is_approved")
?
6条答案
按热度按时间jljoyd4f1#
Similar to @lemon's excellent answer, but gathering both EAN and APPROVED into separate result sets and joining on both
product_id
andordinal
to avoid the potentially large intermediate cartesian result set that is produced by two joins of Cross-Apply. Depending on the size of your data, this may be more performant.dbfiddle version
Note that the
enable_ordinal
parameter is only available in sql-server 2022 16.x or Azure databases. A terrible oversight that it wasn't included whenstring_split()
was initially rolled out.The
row_number
route in @charleschen's answer is likely the next best option, but because there is no guarantee about the order of the values spit out by the cross-apply andstring_split()
it's a bit of a gamble. That being said, I've never seen SQL Server spit outstring_split
results in a different order than the string that was fed into it.ULTIMATELY, if I had an older version of sql-server where the
enable_ordinal
was not available, and this was production level code, I would probably look to pre-processing the source data that was used to populate this table before ingesting into sql server. You could make quick work of this in a pandas dataframe withpandas.Series.explode()
.rqqzpn5f2#
Beginning from SQL Server 2022, you can use
STRING_SPLIT
with the third parameter set to 1 to catch ordinality of the strings being split into rows. Then you can filter on matching ordinalities.Output:
Check the demo here .
68bkxrlz3#
You need an ordinal position to join on in order to guarantee the correct values are joined together. In SQL Server 2022+ String_Split supports an ordinal position however prior to that, you have to roll your own; you simply cannot rely on artificially adding a row number against a set which by definition is unordered.
If you don't have a suitable function I include one below.
Using a defined sequence you can then do:
String split function:
Demo DBFiddle
htrmnn0y4#
Just another option if not 2022. This will be supported by 2016+
Note: Provided your data consistency, the
string_escape( ..., 'json')
could be trimmed out.Example
Results
qnyhuwrf5#
This is an other way using
string_split
andinner join
:Result :
Starting from SQL Server 2022, the
STRING_SPLIT()
function supports an optional third parameter to enable_ordinal.Demo here
rryofs0p6#
To link the position in the
string_split
function for EAN with the position ofstring_split
foris_approved
, you can make use of theROW_NUMBER
function to assign a row number to each element in both separate values. Then you can join the results based on the row number, like this:This query uses Common Table Expressions (CTEs) to assign row numbers to the separated values of
ean_upc
andis_approved
using theROW_NUMBER
function. Then it joins the two CTEs based on theproduct_id
and the row number (rn
) to link the corresponding elements together.This way, the result will include the
product_id
, the separatedean_upc
, and the correspondingis_approved
flag for each entry.