SQL Server Use XML as a SQL table

cgvd09ve  于 2023-10-15  发布在  其他
关注(0)|答案(4)|浏览(116)

Using SQL Server 2016, primarily.

I have data available to me that is formatted in XML, but I need to be able to join this data to other tables in a SQL Server database.

I have looked around, but the examples I have found use a different data layout in the XML. The examples have distinct node names in the document. My XML has a node named metadata that defines the name, data type, and length of each data item . In the data node, each row has value nodes that correspond with the item nodes in the metadata node. Also, my XML uses namespacing. None of the examples I have found do. I have had problems with namespacing in other SQL/XML tasks I have worked through, so that is probably significant.

How can I use SQL to convert something like this...

<?xml version="1.0" encoding="utf-8"?>
<dataset  xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
    xmlns="http://developer.cognos.com/schemas/xmldata/1/"
    xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
    xs:schemaLocation="http://developer.cognos.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
    <metadata>
          <item name="Task" type="xs:string" length="-1"/>
          <item name="Task Number" type="xs:string" length="-1"/>
          <item name="Group" type="xs:string" length="-1"/>
          <item name="Work Order" type="xs:string" length="-1"/>
    </metadata>
    <data>
        <row>
            <value>3361B11</value>
            <value>1</value>
            <value>01</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>2</value>
            <value>50</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>3</value>
            <value>02</value>
            <value>JA0520</value>
        </row>
    </data>
</dataset>

...into a tabular format like...

TaskTask NumberGroupWork Order
3361B11101MS7579
3361B11250MS7579
3361B11302JA0520

...so I can join it to other data.

I think the first step would be to query the metadata node to get the column name, data type, and length.

input (boiled down)

<?xml version="1.0" encoding="utf-8"?>
<dataset  xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <metadata>
          <item name="Task" type="xs:string" length="-1"/>
          <item name="Task Number" type="xs:string" length="-1"/>
          <item name="Group" type="xs:string" length="-1"/>
          <item name="Work Order" type="xs:string" length="-1"/>
    </metadata>
</dataset>

output

idnametypelength
1Taskstring-1
2Task Numberstring-1
3Groupstring-1
4Work Orderstring-1

Once that is obtained, the next part would be generated dynamically using those values.

In case it is helpful for fiddling, here is a SQL statement that produces the desired result:

with
xlTask as (
  select *
  from (
    values
      ('3361B11', '1', '01', 'MS7579')
    , ('3361B11', '2', '50', 'MS7579')
    , ('3361B11', '3', '02', 'JA0520')
  ) q ([Task], [Task Number], [Group], [Work Order])
)
select *
from xlTask
uoifb46i

uoifb46i1#

Please try the following solution.

It will work on the condition if all <value> XML elements always have the same sequential order, from 1 to 4.

After that having data as a relational/rectangular dataset you can join it with real tables/views in the database.

SQL

DECLARE @xml XML =
'<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/"
         xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <metadata>
        <item name="Task" type="xs:string" length="-1"/>
        <item name="Task Number" type="xs:string" length="-1"/>
        <item name="Group" type="xs:string" length="-1"/>
        <item name="Work Order" type="xs:string" length="-1"/>
    </metadata>
    <data>
        <row>
            <value>3361B11</value>
            <value>1</value>
            <value>01</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>2</value>
            <value>50</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>3</value>
            <value>02</value>
            <value>JA0520</value>
        </row>
    </data>
</dataset>';

WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT c.value('(value[1]/text())[1]', 'VARCHAR(20)') AS Task
    , c.value('(value[2]/text())[1]', 'VARCHAR(20)') AS [Task Number]
    , c.value('(value[3]/text())[1]', 'VARCHAR(20)') AS [Group]
    , c.value('(value[4]/text())[1]', 'VARCHAR(20)') AS [Work Order]
FROM @xml.nodes('/dataset/data/row') AS t(c);

Output

TaskTask NumberGroupWork Order
3361B11101MS7579
3361B11250MS7579
3361B11302JA0520
ukdjmx9f

ukdjmx9f2#

If you don't know the ordering you can pull that out of the metadata node, and pass it in to predicates using the sql:column function.

Unfortunately, SQL Server does not support the returning the position(.) XQuery function outside a predicate, so we need to hack it by counting nodes.

WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT
  c.value('(value[position() = sql:column("pos.Task"      )]/text())[1]', 'VARCHAR(20)') AS Task,
  c.value('(value[position() = sql:column("pos.TaskNumber")]/text())[1]', 'VARCHAR(20)') AS TaskNumber,
  c.value('(value[position() = sql:column("pos.Group"     )]/text())[1]', 'VARCHAR(20)') AS [Group],
  c.value('(value[position() = sql:column("pos.WorkOrder" )]/text())[1]', 'VARCHAR(20)') AS WorkOrder
FROM (
    SELECT
      Task       = md.md.value('let $i := item[@name = "Task"       ][1] return count(./item[. << $i]) + 1', 'int'),
      TaskNumber = md.md.value('let $i := item[@name = "Task Number"][1] return count(./item[. << $i]) + 1', 'int'),
      [Group]    = md.md.value('let $i := item[@name = "Group"      ][1] return count(./item[. << $i]) + 1', 'int'),
      WorkOrder  = md.md.value('let $i := item[@name = "Work Order" ][1] return count(./item[. << $i]) + 1', 'int')
    FROM @xml.nodes('dataset/metadata') md(md)
) pos
CROSS APPLY @xml.nodes('/dataset/data/row') AS t(c);

The logic is as follows:

  • Within a subquery:

  • In XQuery, take the metadata node, and look for a descendant item which has name = "Task" and so on. Assign that node to $i .

  • Return the count of item nodes that are before $i plus one.

  • And so on for each column name.

  • Then shred the rest of the XML as normal.

  • In the predicates, use position() = sql:column("YourColumnName") which gives us the previously calculated position.

db<>fiddle

If you want a completely dynamic solution, with dynamic column names, then you need dynamic SQL.

DECLARE @cols nvarchar(max);

WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT
  @cols = CAST(@xml.query('
  text {
    for $i in dataset/metadata/item
    return concat(
      ", c.value(''(value[",
      string(count(./item[. << $i]) + 1),
      "]/text())[1]'', ''VARCHAR(100)'') AS [",
      $i/@name,
      "]
 "
    )
  }
  ') AS nvarchar(max));

SET @cols = STUFF(@cols, 1, 2, N'');

DECLARE @sql nvarchar(max) = '
WITH XMLNAMESPACES(DEFAULT ''http://developer.cognos.com/schemas/xmldata/1/'')
SELECT
    ' + @cols + '
FROM @xml.nodes(''/dataset/data/row'') AS t(c);
';

PRINT @sql;  -- your friend

EXEC sp_executesql @sql,
  N'@xml xml',
  @xml = @xml;

Note that sp_executesql is used, in order to pass the @xml variable all the way through.

db<>fiddle

gdx19jrr

gdx19jrr3#

Here's a dynamic solution from your sample data:

DECLARE @SQL NVARCHAR(MAX)

;WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT  @SQL = N'WITH XMLNAMESPACES(DEFAULT ''http://developer.cognos.com/schemas/xmldata/1/'')
SELECT '
 + STRING_AGG('c.value(''(value[' + T.X.value('let $i := . return count(/dataset/metadata/item[. << $i]) + 1', 'NVARCHAR(MAX)') + N']/text())[1]'', ''VARCHAR(20)'') AS '
 + QUOTENAME(T.X.value('@name', 'nvarchar(100)')) , ',')
 + N'
FROM @xml.nodes(''/dataset/data/row'') AS t(c)'
FROM    @xml.nodes('/dataset/metadata/item') as T(X);
EXEC SP_EXECUTESQL @SQL, N'@xml XML', @xml = @xml

It reads the metadata and generates sql that reads actual rows and creates proper aliases

ldxq2e6h

ldxq2e6h4#

Maybe you can try to create something to convert any XML to a SQL Table and then use SQL to extract the data

You can try the following:

USE:

select *  from Utility.FlattenXml('your xml’)
    
PROCEDURE:

USE [YOUR_DB] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO
 
CREATE FUNCTION [Utility].[FlattenXml](@xmlDoc XML)   RETURNS TABLE 
 AS RETURN  WITH CTE AS (

SELECT 
        1 AS lvl,  
        x.value('local-name(.)','NVARCHAR(MAX)') AS Name,  
        CAST(NULL AS NVARCHAR(MAX)) AS ParentName, 
        CAST(1 AS INT) AS ParentPosition, 
        CAST(N'Element' AS NVARCHAR(20)) AS NodeType,  
        x.value('local-name(.)','NVARCHAR(MAX)') AS FullPath,  
        x.value('local-name(.)','NVARCHAR(MAX)')  
        + N'[' 
        + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NVARCHAR)  
        + N']' AS XPath,  
        ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS Position, 
        x.value('local-name(.)','NVARCHAR(MAX)') AS Tree,  
        x.value('text()[1]','NVARCHAR(MAX)') AS Value,  
        x.query('.') AS this,         
        x.query('*') AS t,  
        CAST(CAST(1 AS VARBINARY(4)) AS VARBINARY(MAX)) AS Sort,  
        CAST(1 AS INT) AS ID  
FROM @xmlDoc.nodes('/*') a(x)  
UNION ALL 
SELECT 
        p.lvl + 1 AS lvl,  
        c.value('local-name(.)','NVARCHAR(MAX)') AS Name,  
        CAST(p.Name AS NVARCHAR(MAX)) AS ParentName, 
        CAST(p.Position AS INT) AS ParentPosition, 
        CAST(N'Element' AS NVARCHAR(20)) AS NodeType,  
        CAST(p.FullPath + N'/' + c.value('local-name(.)','NVARCHAR(MAX)') AS NVARCHAR(MAX)) AS FullPath,  
        CAST(p.XPath + N'/'+ c.value('local-name(.)','NVARCHAR(MAX)')+ N'['+ CAST(ROW_NUMBER() OVER(PARTITION BY c.value('local-name(.)','NVARCHAR(MAX)') 
        ORDER BY (SELECT 1)) AS NVARCHAR)+ N']' AS NVARCHAR(MAX)) AS XPath,  
        ROW_NUMBER() OVER(PARTITION BY c.value('local-name(.)','NVARCHAR(MAX)')
        ORDER BY (SELECT 1)) AS Position, 
        CAST( SPACE(2 * p.lvl - 1) + N'|' + REPLICATE(N'-', 1) + c.value('local-name(.)','NVARCHAR(MAX)') AS NVARCHAR(MAX)) AS Tree,  
        CAST( c.value('text()[1]','NVARCHAR(MAX)') AS NVARCHAR(MAX) ) AS Value, c.query('.') AS this,  
        c.query('*') AS t,  
        CAST(p.Sort + CAST( (lvl + 1) * 1024 + (ROW_NUMBER() OVER(ORDER BY (SELECT 1)) * 2) AS VARBINARY(4)) AS VARBINARY(MAX) ) AS Sort,  
        CAST((lvl + 1) * 1024 + (ROW_NUMBER() OVER(ORDER BY (SELECT 1)) * 2) AS INT)
        
FROM CTE p  
CROSS APPLY p.t.nodes('*') b(c)), cte2 AS (  
                                            SELECT 
                                            lvl AS Depth,  
                                            Name AS NodeName,  
                                            ParentName, 
                                            ParentPosition, 
                                            NodeType,  
                                            FullPath,  
                                            XPath,  
                                            Position, 
                                            Tree AS TreeView,  
                                            Value,  
                                            this AS XMLData,  
                                            Sort, 
                                            ID
                                            FROM cte  
UNION ALL 
SELECT 
        p.lvl,  
        x.value('local-name(.)','NVARCHAR(MAX)'),  
        p.Name, 
        p.Position, 
        CAST(N'Attribute' AS NVARCHAR(20)),  
        p.FullPath + N'/@' + x.value('local-name(.)','NVARCHAR(MAX)'),  
        p.XPath + N'/@' + x.value('local-name(.)','NVARCHAR(MAX)'),  
        1, 
        SPACE(2 * p.lvl - 1) + N'|' + REPLICATE('-', 1)  
        + N'@' + x.value('local-name(.)','NVARCHAR(MAX)'),  
        x.value('.','NVARCHAR(MAX)'),  
        NULL,  
        p.Sort,  
        p.ID + 1  
FROM CTE p  
CROSS APPLY this.nodes('/*/@*') a(x)  
)  

SELECT into [your table]
        ROW_NUMBER() OVER(ORDER BY Sort, ID) AS ID,  
        ParentName, ParentPosition,Depth, NodeName, Position,   
        NodeType, FullPath, XPath, TreeView, Value, XMLData 
FROM CTE2

相关问题