SQL Server Counting pairs where one element is the same, but the other differs

cnjp1d6j  于 2023-10-15  发布在  其他
关注(0)|答案(2)|浏览(123)

I have a table with a long list of pairs (user, organisation), alongside other columns. I only want to see the pairs with the same user, but different organisation.
| User | Organisation | Other Data |
| ------------ | ------------ | ------------ |
| 1 | 1 | example1_1 |
| 2 | 1 | example2_1 |
| 3 | 1 | example3_1 |
| 3 | 1 | example3_1 |
| 3 | 2 | example3_2 |
| 3 | 1 | example3_1 |
| 4 | 1 | example4_1 |
| 4 | 2 | example4_2 |
| 4 | 3 | example4_3 |
| 4 | 1 | example4_1 |

My attempt with a ROW_NUMBER () OVER (PARTITION BY ...), was to just have a flag greater than 1 where these records have the same user ID but a different organisation ID. But I cannot make it to increment on different organisation IDs, it counts when the organisation ID is the same instead. The desired result is:

UserOrganisationOther Data
31example3_1
32example3_2
41example4_1
42example4_2
43example4_3

Would someone please be able to generalise it so that I can partition on a fixed number of elements, so that some of my choice stay fixed and others vary?

j8yoct9x

j8yoct9x1#

More nesting than I like, but this will do it:

SELECT [User], Organization, OtherData
FROM (
    select uo.*, row_number() over (partition by uo.[User], Organization order by OtherData) rn
    from UserOrgs uo
    where exists( 
        SELECT 1
        FROM [UserOrgs] uo0
        WHERE uo0.[User] = uo.[User]
        GROUP BY uo0.[User]
        HAVING COUNT(Organization) > 1
    ) 
) w -- window
WHERE rn = 1

See it (and two other options) work here:

https://dbfiddle.uk/vjJX9YX4

If there is more than one row for a given (User,Organization) pair, you can control which exact row is used for OtherData by changing the order by expression in the row_number() window.

It's also worth mentioning that if OtherData is a stand-in for multiple columns, this method will ensure all the columns shown in a result row come from the same source row. It is possible to craft a solution (shown in the link) using GROUP BY and an aggregate function for each additional column, in place of the row_number() layer. It removes a layer of nesting, which is nice, but it does not make this "same row" guarantee.

A note on the use of ALL CAPS for code in this answer.

SQL has a long-standing tradition (that seems to be fading) of using ALL CAPS for keywords. I have my own adaption of this. When nesting queries, I will alternate capitalization between nesting levels, to help make it obvious what level I'm looking at for a given expression. This is why the middle level is all lower-case, but the inner and outer levels use CAPS.

It never used to matter, but with the advent of window functions I've found a lot of queries have an extra level of nesting just to filter on a window result, and so it's been helpful in recent years. If SQL Server ever gets the QUALIFY keyword this might no longer be as useful.

5m1hhzi4

5m1hhzi42#

Using your example data:

create table #pairs
(
    [User] int,
    Organisation int,
    [Other Data] varchar(20)
)

insert into #pairs values
(1,1,'example1_1'),
(2,1,'example2_1'),
(3,1,'example3_1'),
(3,1,'example3_1'),
(3,2,'example3_2'),
(3,1,'example3_1'),
(4,1,'example4_1'),
(4,2,'example4_2'),
(4,3,'example4_3'),
(4,1,'example4_1')

This query gets your desired results:

select distinct
    [User],
    Organisation,
    [Other Data]
from #pairs
where [User] in
(
    select [User]
    from #pairs
    group by [User]
    having count(distinct Organisation) > 1
)

Results:

UserOrganisationOther Data
31example3_1
32example3_2
41example4_1
42example4_2
43example4_3

However, this assumes that for a given pair of User and Organisation values, there will only be one distinct Other Data value. If this is not the case, you need to update your question to explain how you want to determine which Other Data value to return.

相关问题