I have a table with 25,000 rows. Table Audit (Id int identity(1,1), AdditionalInfo xml) The sample data in AdditionalInfo column for a row looks like below
<Audit version="1">
<Context name="Event">
<Action name="OrganizationEventReceived">
<Input>
<Source type="SourceOrganizationId">77d2678b-ea4a-43ad-816b-c63edf206b08</Source>
<Target type="TargetOrganizationId">b98fd3ae-dbcb-4826-9d92-7e445ad61273,b98fd3ae-dbcb-4826-9d92-7e445ad61273,b98fd3ae-dbcb-4826-9d92-7e445ad61273</Target>
</Input>
</Action>
</Context>
</Audit>
I like to shred the xml and collect the data in output dataset with following query.
SELECT Id,
p.value('(@name)[1]', 'nvarchar (100)') AS TargetAction,
p.value('(Input/Source/text())[1]', 'nvarchar (500)') AS Source,
p.value('(Input/Target/text())[1]', 'nvarchar (max)') AS Target
FROM dbo.Audit CROSS APPLY AdditionalInfo.nodes('/Audit/Context/Action') AS AdditionalInfo(p)
The performance of the query is bad. It is taking 15 seconds to give the result set for just 25,000 rows. Is there a better way of doing it. I even tried putting primary and secondary xml indexes on AdditionalInfo column. Please help and let me know, to use better sql server xquery techniques.
Thanks,
1条答案
按热度按时间vsaztqbk1#
Great question.
My recent task requires to parse about 35'000 XML documents, valid document being ~20kB.
More and larger xml files tend to exponentially fill the memory:
Try to distribute your work:
target
stores unstructured data, which eats most of computing power due to the data type and different length in valuesCROSS APPLY
matters: avoid triple nodes innodes()
, consider two nodes and recursion (see below on split)WHERE id IN (1,2,3)
FOR
;DECLARE @xml_doc XML; SET @xml_doc = SELECT xmldata FROM xmlsource WHERE id=1;
ROW_NUMBER()
, thenLEFT JOIN
all parts to xml documents list using some identifier, such asxml_id