SQL Server How to delete duplicates if all the their linked values in another table (foreign key linked rows) are also the same

juzqafwq  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(79)

I have three tables:

x (column1,..., x_id)

y (column1, column2, colum3, ..., x_id, y_id)

z (column1, column2, colum3,..., x_id, y_id, z_id)

My final goal: delete duplicates in y table if all of their linked values (based on foreign keys) in z table are also exactly the same. For example, This is just an example.

For more clarification consider this example: y table: my table structure is very similar to this and this table is close to the reality of what I have.
| column 1 | column 2 | x_id | y_id |
| ------------ | ------------ | ------------ | ------------ |
| foo | bar | 1 | 1 |
| foo | bar | 1 | 2 |
| xx | yy | 2 | 3 |
| zz | kk | 2 | 4 |
| tt | mm | 2 | 5 |
| baz | qux | 3 | 6 |
| baz | qux | 3 | 7 |

has two groups with the same column 1 and column 2 and x_id values (the y_ids = (1 and 2) and y_id == (6,7).

Then for these rows, I want to check a second table to see if all the items are the same. meaning they have the same items in the z table. The items of the y_id = 1 and y_id = 2 are the same, so one of them in the y table should be deleted.
| {color:red;} column 1 | column 2 | column | x_id | y_id | z_id |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| foo | bar | qux | 1 | 1 | 1 |
| foo2 | bar2 | qux2 | 1 | 1 | 2 |
| foo | bar | qux | 1 | 2 | 3 |
| foo2 | bar2 | qux2 | 1 | 2 | 4 |
| baz | foo | qux | 3 | 6 | 5 |
| baz2 | foo2 | qux1 | 3 | 6 | 6 |
| baz3 | foo3 | qux2 | 3 | 6 | 7 |
| baz | foo | qux | 3 | 7 | 8 |
| baz2 | foo2 | qux1 | 3 | 7 | 9 |

I want to delete the row with y_id = 1 from the y table ( or y_id = 2, which duplicate row doesn't make a difference). The tables are very big so a fast solution would be great. I know I can join the tables, but I was hoping I could find another solution. A solution that can use the group by IDs instead and just look for the found items there inside the z table (third table)

mqkwyuun

mqkwyuun1#

I believe the below example will help me to delete the rows I want:

WITH cte AS (
    SELECT
        a.column1 AS a_column1,
        a.column2 AS a_column2,
        a.y_id AS a_y_id,
        a.x_id AS a_x_id,
        b.column1 AS a_column1,
        b.colum2 AS a_column2,
        b.y_id AS b_y_id ,
        b.x_id AS b_x_id 
    FROM YTable a
    JOIN YTable b
    ON a.column1 = b.column1 
        AND a.column2 = b.column2 
        AND a.x_id = b.x_id 
        AND a.y_id != b.y_id
    WHERE a.y_id < b.y_id
)
SELECT
    cte.*,
    CASE
        WHEN EXISTS (
            SELECT 1
            FROM (
                SELECT * FROM ZTable
                WHERE y_id = cte.y_id
            ) table1
            FULL JOIN (
                SELECT * FROM ZTable
                WHERE y_id = cte.y_id
            ) table2
            ON table1.colum1 = table2.column1
                AND table1.colum2 = table2.colum2
                AND table1.colum3 = table2.colum3
            WHERE table1.column1 IS NULL OR table2.column1 IS NULL
        )
        THEN 'Not Identical'
        ELSE 'Identical'
    END AS duplicated
FROM cte;

相关问题