SQL Server Multiple STRING_AGG on multiple join columns causes bloated aggregation

gk7wooem  于 2023-06-04  发布在  其他
关注(0)|答案(2)|浏览(131)

I've got a table in my MSSQL server, lets call it blogPost . I've also got two tag tables, lets call them fooTag and barTag . The tag tables are used to tag the blogPost table which are identically structured.

blogPost

| postId | title               |        body |
+--------+---------------------+-------------+
| 1      | The life on a query | lorem ipsum |
+--------+---------------------+-------------+

fooTag and barTag

| postId | tagName      |
+--------+--------------+
| 1      | sql          |
| 1      | query        |
| 1      | select-query |
+--------+--------------+

I want to get a single blogpost along with all it's tags in a single row so then STRING_AGG() feels suitable to make a query like this:

SELECT blogPost.*, STRING_AGG(fooTag.tagName, ';') as [fooTags], STRING_AGG(barTag.tagName, ';') as [barTags]
FROM blogPost
LEFT JOIN fooTag ON blogPost.postId = fooTag.postId
LEFT JOIN barTag ON blogPost.postId = barTag.postId
WHERE postId = 1
GROUP BY blogPost.postId, title, body

When making this query I'd expect to get the result

| postId | title               |        body | fooTags                 | barTags                 |
+--------+---------------------+-------------+-------------------------+-------------------------+
| 1      | The life on a query | lorem ipsum | sql;query;select-query | sql;query;select-query |
+--------+---------------------+-------------+-------------------------+-------------------------+

But I'm getting this result instead where bar tags (i.e. the last STRING_AGG selected) are duplicated.

| postId | title               |        body | fooTags                 | barTags                                       |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
| 1      | The life on a query | lorem ipsum | sql;query;select-query; | sql;sql;sql;query;query;query;select-query;select-query;select-query |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+

Putting barTags last in the SELECT statement makes it so that barTags gets the duplicates instead of fooTags . The amount of duplicates created seem to be bound to the amount of rows columns being aggregated together in the first STRING_AGG result column, so if fooTags has 5 rows to aggregate together there will be 5 duplicates of each barTag in the barTags column in the result.

How would I get the result I want without duplicates?

piok6c0g

piok6c0g1#

Your problem is caused by each row in fooTags creating that many rows of barTags in the JOIN , hence the duplication. You can work around this issue by performing the STRING_AGG in the footags and bartags tables beforeJOIN ing them:

SELECT blogPost.*, f.tags as [fooTags], b.tags as [barTags]
FROM blogPost
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
           FROM fooTag
           GROUP BY postId) f ON blogPost.postId = f.postId
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
           FROM barTag
           GROUP BY postId) b ON blogPost.postId = b.postId
WHERE postId = 1
hrirmatl

hrirmatl2#

You can simplify the query like so:

SELECT blogPost.*, ca1.*, ca2.*
FROM blogPost
OUTER APPLY (
    SELECT STRING_AGG(tagName, ';')
    FROM fooTag
    WHERE blogPost.postId = fooTag.postId
) AS ca1(fooTags)
OUTER APPLY (
    SELECT STRING_AGG(tagName, ';')
    FROM barTag
    WHERE blogPost.postId = barTag.postId
) AS ca2(barTags)
WHERE postId = 1

No GROUP BY required, in your case it'll be an expensive operation.

相关问题