SQL Server Is there a way to select custom part of string in a SQL query by using commas as separators?

uz75evzq  于 12个月前  发布在  其他
关注(0)|答案(4)|浏览(88)

The Issue:

I am trying to select from multiple tables, Table A and B but this is mainly regarding the Only column I am choosing from Table B. The values in this column contain an Array of string.

Aim is to get get rid of any [] and then look at string. There are two main types of strings, one that contains only text, and are NOT comma separated, and others that has some combinations of text and comma.

What is Required:

For condition where string does not contain any commas then simply return the text value. For other cases where string does contain single or multiple commas return the values after the first comma. In cases where there are more than 2 or more commas then need to return string after the first comma and before 2nd comma.

Please look at the code for further explanation:

DROP TABLE IF EXISTS

\[dbo\].\[multi_string_db\]

GO

CREATE TABLE \[dbo\].\[multi_string_db\] (

multi_string nvarchar (256)  NULL

)

INSERT INTO \[dbo\].\[multi_string_db\] ( multi_string)

VALUES ('\[Additional time required, Time requested\]')

, ('\[Additional time required, Document requested\]')

, ('\[Additional time required, Missing documents - Personal, Other\]')

, ('\[Additional time required, Missing documents - Personal\]')

, ('Additional time required')

, ('Document Requested')

, ('Missing FPA/evidence')

, ('Missing documents - Office')

, ('Missing documents - Personal')

, ('Other')

, ('Referred to Decision Maker Team')

, ('Target date error')

Desired Results:

Desired_Output
Time requested
Document requested
Missing documents - Personal
Missing documents - Personal
Additional time required
Document Requested
Missing FPA/evidence
Missing documents - Office
Missing documents - Personal
Other
Referred to Decision Maker Team
Target date error

What I have tried so far:

SELECT 
    LTRIM(RTRIM(
        CASE 
            WHEN CHARINDEX('[', multi_string) > 0 AND CHARINDEX(']', multi_string) > 0
            THEN 
                CASE 
                    WHEN CHARINDEX(',', multi_string) > 0 AND CHARINDEX(',', multi_string, CHARINDEX(',', multi_string) + 1) > 0
                    THEN SUBSTRING(multi_string, CHARINDEX(',', multi_string) + 1, CHARINDEX(',', multi_string, CHARINDEX(',', multi_string) + 1) - CHARINDEX(',', multi_string) - 1)
                    WHEN CHARINDEX(',', multi_string) > 0
                    THEN SUBSTRING(multi_string, CHARINDEX(',', multi_string) + 1, CHARINDEX(']', multi_string) - CHARINDEX(',', multi_string) - 1)
                    ELSE SUBSTRING(multi_string, CHARINDEX('[', multi_string) + 1, CHARINDEX(']', multi_string) - CHARINDEX('[', multi_string) - 1)
                END
            ELSE
                LTRIM(RTRIM(multi_string)) -- If no brackets, return the original string
        END
    )) AS Result
            FROM dbo.[multi_string_db]

The Issue (Limitations):

The logic works fine but only when [ ] are present, which definitely is the case with current data, however if there were new data then this logic will fail.

For Example: If we have following string 'Some time required, Received documents - Unchecked, Misc' then the logic fails due to absence of [ ] around the string.

How can I overcome this limitation?

falq053o

falq053o1#

I've added in your 'problem' data in this example

DROP TABLE IF EXISTS #multi_string_db

CREATE TABLE #multi_string_db (
    multi_string nvarchar (256)  NULL
)
INSERT INTO #multi_string_db ( multi_string)
VALUES ('[Additional time required, Time requested]')
, ('[Additional time required, Document requested]')
, ('[Additional time required, Missing documents - Personal, Other]')
, ('[Additional time required, Missing documents - Personal]')
, ('Additional time required')
, ('Document Requested')
, ('Missing FPA/evidence')
, ('Missing documents - Office')
, ('Missing documents - Personal')
, ('Other')
, ('Referred to Decision Maker Team')
, ('Target date error')
, ('Some time required, Received documents - Unchecked, Misc')

SELECT 
        multi_string, 
        Result = 
                REPLACE(REPLACE(
                LTRIM(RTRIM(
                SUBSTRING(  multi_string
                        , CHARINDEX(',', multi_string)+1
                        ,  IIF(CHARINDEX(',', multi_string, CHARINDEX(',', multi_string)+1) = 0
                                            , LEN(multi_string) -  CHARINDEX(',', multi_string)
                                            , CHARINDEX(',', multi_string, CHARINDEX(',', multi_string)+1) - CHARINDEX(',', multi_string) -1
                                            )
                        )
                ))
                , '[', '')
                , ']', '')
    FROM #multi_string_db

If ignores the [ and ] to start with an d just finds the parts required for the result, then trim the result of that and replace any remaining [ and ] .

If you are using SQL Server 2022 you could use string_split() with the ordinal option (I think) and just extract the 1st or 2nd ordinal. It would be really simple then.

nzkunb0c

nzkunb0c2#

You can use STRING_SPLIT() in a subselect and filter for ordinal = 2 . Add logic that first checks for the [list, ...] syntax and strips off the brackets, you would end up with the following:

SELECT M.multi_string,
    CASE WHEN M.multi_string LIKE '\[%,%\]' ESCAPE '\'
        THEN (
            SELECT TRIM(S.value)
            FROM (SELECT SUBSTRING(M.multi_string, 2, LEN(M.multi_string)-2) AS Trimmed) T
            CROSS APPLY STRING_SPLIT(T.Trimmed, ',', 1) S
            WHERE S.ordinal = 2
        )
        ELSE M.multi_string
        END AS Result 
FROM multi_string_db M

The following variation also handles single-valued bracketed lists. returning the first (single) value in that case.

SELECT M.multi_string,
    CASE WHEN M.multi_string LIKE '\[%\]' ESCAPE '\'
        THEN (
            SELECT TOP 1 TRIM(S.value)
            FROM (SELECT SUBSTRING(M.multi_string, 2, LEN(M.multi_string)-2) AS Trimmed) T
            CROSS APPLY STRING_SPLIT(T.Trimmed, ',', 1) S
            WHERE S.ordinal <= 2
            ORDER BY S.ordinal DESC
        )
        ELSE M.multi_string
        END AS Result 
FROM multi_string_db M

See this db<>fiddle for a working demo (with a few extra test cases).

ni65a41a

ni65a41a3#

In the first step I am filtering out [ ] and saving string in a new temporary table.

SELECT
CASE LEFT (cd.multi_string,1)
WHEN '['
THEN SUBSTRING (cd.multi_string, 2, LEN(cd.multi_string) -2)
ELSE cd.multi_string
END AS multi_string
FROM multi_string_db AS cd

Then using the provided logic of STRING_SPLIT

SELECT
split_str.value
FROM multi_string_db AS cd
CROSS APPLY String_Split(multi_string, ',') AS split_str
WHERE multi_string IS NOT NULL;

It does work on SQL Server and other platforms like AWS which is handy but I am sure there is a room for improvement. For some reason ESCAPE does not work on my SQL version on AWS and since I am unsure of a global method I applied this logic. So this is not a simple/clean method and if any better methods it would be appreciated. Another drawback of this is that for any split string it gives me an unwanted white space. For Example [Additional time required, Document requested] Returns ' Document requested' instead of 'Document requested'. where there is a white space before ' D'

Any assistance is appreciated.

4xrmg8kj

4xrmg8kj4#

Please try the following solution based on tokenization via XML and XQuery. It will work starting from SQL Server 2017 onwards.

No need to do any string parsing via endless CHARINDEX() , SUBSTRING() , and other functions.

Notable points:

  • 1st CROSS APPLY is tokenizing a string via XML.
  • 2nd CROSS APPLY is counting # of tokens in a string of tokens separated by comma.
  • SELECT clause has a simple CASE statement to retrieve a correct token in a string of tokens based on token count.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, multi_string NVARCHAR(256));
INSERT INTO @tbl( multi_string) VALUES 
('[Additional time required, Time requested]'), 
('[Additional time required, Document requested]'),
('[Additional time required, Missing documents - Personal, Other]'),
('[Additional time required, Missing documents - Personal]'),
('Additional time required'),
('Document Requested'),
('Missing FPA/evidence'),
('Missing documents - Office'),
('Missing documents - Personal'),
('Other'),
('Referred to Decision Maker Team'),
('Target date error');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = ',';

SELECT ID, t.multi_string --, c, t2.token_count -- uncomment to see XML and the token count
    , result = CASE
        WHEN t2.token_count = 1 THEN TRIM(c.value('(/root/r/text())[1]', 'NVARCHAR(100)'))
        WHEN t2.token_count > 1 THEN TRIM(c.value('(/root/r[2]/text())[1]', 'NVARCHAR(100)'))
        ELSE 'Alarm! Some edge case.'
    END 
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
    REPLACE(TRIM('[]' FROM multi_string), @separator, ']]></r><r><![CDATA[') + 
    ']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.value('count(/root/r)', 'INT')) AS t2(token_count);

Output

multi_stringresult
[Additional time required, Time requested]Time requested
[Additional time required, Document requested]Document requested
[Additional time required, Missing documents - Personal, Other]Missing documents - Personal
[Additional time required, Missing documents - Personal]Missing documents - Personal
Additional time requiredAdditional time required
Document RequestedDocument Requested
Missing FPA/evidenceMissing FPA/evidence
Missing documents - OfficeMissing documents - Office
Missing documents - PersonalMissing documents - Personal
OtherOther
Referred to Decision Maker TeamReferred to Decision Maker Team
Target date errorTarget date error

相关问题