SQL Server : querying XML from column. Getting duplicate values

uyto3xhc  于 2023-08-02  发布在  SQL Server
关注(0)|答案(1)|浏览(106)

I am working on learning how to query for data from a column where XML files are stored.

The XML files contain some information as comment as I am trying to fetch since the comment isn't in the file in all cases. It all depends on the response code in the XML file.

I am at a point where I am starting to confuse myself from trying different ways.

The files can have 1 or more (5, 10, 100, 1000 etc) PaymentStatus nodes.

I need to fetch the following from the files:

  1. ReasonCode/Code
  2. ReasonInformation
  3. Comments in the file

I am having 2 issues:

Problem #1

  • Some files contain information ( ReasonInformation (in a node which I do not need)). This information is of no use. This value will be different between files, so I can't use a case/IIF to skip it
  • I need the value from the ReasonCode and ReasonInformation node below.
  • I need to skip the first one always, but fetch all the ReasonCode and ReasonInformation below.

Problem #2

  • Comment is duplicated to the all the ReasonCode and ReasonInformation in the node.
  • I'd like it to only exist for the ReasonCode and ReasonInformation node it belongs to.

The result I am looking for:
| Code | ReasonInformation | Comment |
| ------------ | ------------ | ------------ |
| 011 | Creditor account info invalid or missing | This is a comment |
| 012 | Creditor payment info invalid or missing | |
| 001 | | |
| 034 | Invoice number is mandatory | Invoice number is invalid or missing |

SQL fiddle, XML files and SQLs for creating the table

Adding SQL fiddle as well as the XML files and SQLs to create the SQL fiddle if it were to not work, or if it stops working at a later time in case others are struggling with the same problem.

SQL fiddle: http://sqlfiddle.com/#!18/c42a24/3

XML files

XML file with ID = 1

<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:abc:Example">
<CInfo>
        <Id>12345678</Id>
        <Created>2023-07-12</Created>
        <Customer>
            <Id>
                <Org>Demo Org</Org>
                <Id>88343434</Id>
            </Id>
        </Customer>
    </CInfo>
    <PaymentStatus>
        <OrgPaymentFileID>1234567</OrgPaymentFileID>
        <PaymentStatusInfo>
            <PaymentId>422049343</PaymentId>
            <Status>Rejected</Status>
            <StatusInfo>
                <ReasonCode>
                    <Code>011</Code>
                </ReasonCode>
                <!-- This is a comment -->
                <ReasonInfo>Creditor account info invalid or missing</ReasonInfo>
            </StatusInfo>
            <StatusInfo>
                <ReasonCode>
                    <Code>012</Code>
                </ReasonCode>
                <ReasonInfo>Creditor payment info invalid or missing</ReasonInfo>
            </StatusInfo>
        </PaymentStatusInfo>
        <PaymentStatusInfo>
            <PaymentId>24234234</PaymentId>
            <Status>Rejected</Status>
            <StatusInfo>
                <ReasonCode>
                    <Code>011</Code>
                </ReasonCode>
                <ReasonInfo>Creditor account info invalid or missing</ReasonInfo>
            </StatusInfo>
            <StatusInfo>
                <ReasonCode>
                    <Code>012</Code>
                </ReasonCode>
                <ReasonInfo>Creditor payment info invalid or missing</ReasonInfo>
            </StatusInfo>
        </PaymentStatusInfo>
        <PaymentStatusInfo>
            <PaymentId>8453453</PaymentId>
            <Status>Accepted</Status>
            <StatusInfo>
                <ReasonCode>
                    <Code>001</Code>
                </ReasonCode>
            </StatusInfo>
        </PaymentStatusInfo>
    </PaymentStatus>
</Document>

XML file with ID = 2 :

<Document xmlns="urn:abc:Example">
<CInfo>
        <Id>3852323</Id>
        <Created>2023-07-12</Created>
        <Customer>
            <Id>
                <Org>Demo Org</Org>
                <Id>88343434</Id>
            </Id>
        </Customer>
    </CInfo>
    <PaymentStatus>
        <OrgPaymentFileID>349232</OrgPaymentFileID>
        <PaymentStatusInfo>
            <PaymentId>193483</PaymentId>
            <Status>Rejected</Status>
            <StatusInfo>
                <ReasonInfo>123345345</ReasonInfo>
            </StatusInfo>
            <StatusInfo>
                <ReasonCode>
                    <Code>034</Code>
                    <!-- Invoice number is invalid or missing -->
                </ReasonCode>
                <ReasonInfo>Invoice number is mandatory</ReasonInfo>
            </StatusInfo>
        </PaymentStatusInfo>
        <PaymentStatusInfo>
            <PaymentId>4083421</PaymentId>
            <Status>Accepted</Status>
            <StatusInfo>
                <ReasonCode>
                    <Code>001</Code>
                </ReasonCode>
            </StatusInfo>
        </PaymentStatusInfo>
    </PaymentStatus>
</Document>

SQL queries for SQL fiddle

Create table:

CREATE TABLE MyXmlTable
(
    ID Int,
    XmlData varchar(max)
)
  
INSERT INTO MyXmlTable 
VALUES (1,
   '<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:abc:Example"> <CInfo>         <Id>12345678</Id>         <Created>2023-07-12</Created>         <Customer>             <Id>                 <Org>Demo Org</Org>                 <Id>88343434</Id>             </Id>         </Customer>     </CInfo>     <PaymentStatus>         <OrgPaymentFileID>1234567</OrgPaymentFileID>         <PaymentStatusInfo>             <PaymentId>422049343</PaymentId>             <Status>Rejected</Status>             <StatusInfo>                 <ReasonCode>                     <Code>011</Code>                 </ReasonCode>                 <!-- This is a comment -->                 <ReasonInfo>Creditor account info invalid or missing</ReasonInfo>             </StatusInfo>             <StatusInfo>                 <ReasonCode>                     <Code>012</Code>                 </ReasonCode>                 <ReasonInfo>Creditor payment info invalid or missing</ReasonInfo>             </StatusInfo>         </PaymentStatusInfo>         <PaymentStatusInfo>             <PaymentId>24234234</PaymentId>             <Status>Rejected</Status>             <StatusInfo>                 <ReasonCode>                     <Code>011</Code>                 </ReasonCode>                 <ReasonInfo>Creditor account info invalid or missing</ReasonInfo>             </StatusInfo>             <StatusInfo>                 <ReasonCode>                     <Code>012</Code>                 </ReasonCode>                 <ReasonInfo>Creditor payment info invalid or missing</ReasonInfo>             </StatusInfo>         </PaymentStatusInfo>         <PaymentStatusInfo>             <PaymentId>8453453</PaymentId>             <Status>Accepted</Status>             <StatusInfo>                 <ReasonCode>                     <Code>001</Code>                 </ReasonCode>             </StatusInfo>         </PaymentStatusInfo>     </PaymentStatus> </Document>')
   
INSERT INTO MyXmlTable 
VALUES (2,
   ' <Document xmlns="urn:abc:Example"> <CInfo>         <Id>3852323</Id>         <Created>2023-07-12</Created>         <Customer>             <Id>                 <Org>Demo Org</Org>                 <Id>88343434</Id>             </Id>         </Customer>     </CInfo>     <PaymentStatus>         <OrgPaymentFileID>349232</OrgPaymentFileID>         <PaymentStatusInfo>             <PaymentId>193483</PaymentId>             <Status>Rejected</Status>             <StatusInfo>                 <ReasonInfo>123345345</ReasonInfo>             </StatusInfo>             <StatusInfo>                 <ReasonCode>                     <Code>034</Code>                     <!-- Invoice number is invalid or missing -->                 </ReasonCode>                 <ReasonInfo>Invoice number is mandatory</ReasonInfo>             </StatusInfo>         </PaymentStatusInfo>         <PaymentStatusInfo>             <PaymentId>4083421</PaymentId>             <Status>Accepted</Status>             <StatusInfo>                 <ReasonCode>                     <Code>001</Code>                 </ReasonCode>             </StatusInfo>         </PaymentStatusInfo>     </PaymentStatus> </Document> '
   )

Query:

WITH MyTable AS
(
    SELECT
        -- xmldata
        tabl.col.query('.') AS xmlExtract,
        IIF(CHARINDEX('<!--', CONVERT(varchar(max), xmldata)) > 0,
            SUBSTRING(CONVERT(varchar(max), xmldata), 
                      CHARINDEX('<!--', CONVERT(varchar(max), xmldata)) + 4,
                     (CHARINDEX('-->', CONVERT(varchar(max), xmldata)) - CHARINDEX('<!--', CONVERT(varchar(max), xmldata)) ) -4),
            NULL) AS Comment
    FROM
        (SELECT CAST(xmldata AS xml) AS xmldata FROM myxmltable) AS srctbl
         CROSS APPLY xmldata.nodes('//*:PaymentStatus/*:PaymentStatusInfo/*:StatusInfo') AS Tabl(col)
)
SELECT
    CONVERT(varchar(max),
            xmlextract.query('for $value in /*:StatusInfo/*:ReasonCode/*:Code
    return string($value)')),
    CONVERT(varchar(max),
            xmlextract.query('
    for $value in /*:StatusInfo/*:ReasonInfo
    return concat(string($value), ";")')
),
    Comment
FROM
    mytable
bprjcwpo

bprjcwpo1#

Looks like you are completely overcomplicating it.

You can get comment nodes in XQuery using the comment() function.

And you just need to add a [] predicate in a .nodes call to check for ReasonCode , in order to filter out the StatusInfo nodes you don't want.

So something like this

WITH XMLNAMESPACES (DEFAULT 'urn:abc:Example')
SELECT
  ReasonCode = StatusInfo.value('(ReasonCode/Code/text())[1]', 'char(3)'),
  ReasonInfo = StatusInfo.value('(ReasonInfo/text())[1]', 'varchar(1000)'),
  Comment    = StatusInfo.value('(.//comment())[1]', 'varchar(1000)')
FROM MyXmlTable t
CROSS APPLY (VALUES(
    CAST(t.XmlData AS xml)
)) v(xmlVal)
CROSS APPLY v.xmlVal.nodes('
  Document/PaymentStatus/PaymentStatusInfo/StatusInfo [ReasonCode]
  ') x1(StatusInfo);

I note that the comment seems to come in different places, I've used (.//comment())[1] to just get the first comment in any descendant node.

Also, use WITH XMLNAMESPACES rather than *: to search by namespace, as this is more performant.

You should also consider converting the actual column's data type to XML, rather than converting it every time you want to query it.

db<>fiddle

相关问题