sql server Xquery节点值性能

ctehm74n  于 2022-12-10  发布在  SQL Server
关注(0)|答案(1)|浏览(152)

I have a table with 25,000 rows. Table Audit (Id int identity(1,1), AdditionalInfo xml) The sample data in AdditionalInfo column for a row looks like below

<Audit version="1">
  <Context name="Event">
    <Action name="OrganizationEventReceived">
      <Input>
        <Source type="SourceOrganizationId">77d2678b-ea4a-43ad-816b-c63edf206b08</Source>
        <Target type="TargetOrganizationId">b98fd3ae-dbcb-4826-9d92-7e445ad61273,b98fd3ae-dbcb-4826-9d92-7e445ad61273,b98fd3ae-dbcb-4826-9d92-7e445ad61273</Target>
      </Input>
    </Action>
  </Context>
</Audit>

I like to shred the xml and collect the data in output dataset with following query.

SELECT   Id,
         p.value('(@name)[1]', 'nvarchar (100)') AS TargetAction, 
         p.value('(Input/Source/text())[1]', 'nvarchar (500)') AS Source, 
         p.value('(Input/Target/text())[1]', 'nvarchar (max)') AS Target
FROM dbo.Audit  CROSS APPLY AdditionalInfo.nodes('/Audit/Context/Action') AS AdditionalInfo(p)

The performance of the query is bad. It is taking 15 seconds to give the result set for just 25,000 rows. Is there a better way of doing it. I even tried putting primary and secondary xml indexes on AdditionalInfo column. Please help and let me know, to use better sql server xquery techniques.
Thanks,

vsaztqbk

vsaztqbk1#

Great question.
My recent task requires to parse about 35'000 XML documents, valid document being ~20kB.
More and larger xml files tend to exponentially fill the memory:

  • 100 documents: 0:33
  • 1000 documents: 25:00 😵‍💫

Try to distribute your work:

  • Variable target stores unstructured data, which eats most of computing power due to the data type and different length in values
  • depth of nodes in CROSS APPLY matters: avoid triple nodes in nodes() , consider two nodes and recursion (see below on split)
  • batch mode: process several documents at time, WHERE id IN (1,2,3)
  • loop a list of documents, FOR ;
  • parse using local variables, such as DECLARE @xml_doc XML; SET @xml_doc = SELECT xmldata FROM xmlsource WHERE id=1;
  • avoid exporting xml node content, only write result values
  • parse all elements separately: saving order of elements using function ROW_NUMBER() , then LEFT JOIN all parts to xml documents list using some identifier, such as xml_id

相关问题