Find multiple duplicate rows in SQL Server table by name and date

qacovj5a  于 2023-06-28  发布在  SQL Server
关注(0)|答案(1)|浏览(148)

I am trying to find duplicate records by name and the date submitted.

For example, I want to get all records with same name of "SampleOne" and same submitted date of "2016-10-14 12:44 39 460" and exclude any data that doesn't match.

I am using the query shown below, but I'm getting other data too. From my screenshot below I need only rows 1- 4 and not show the ones that doesn't have the same name and same submitDate .

SELECT TOP (13) 
    a.DocumentName, a.SubmitDateTime
FROM 
    Document AS a 
INNER JOIN
    (SELECT  
         DocumentName, SubmitDateTime, COUNT(*) AS DocCount
     FROM    
         Document
     GROUP BY 
         DocumentName, SubmitDateTime
     HAVING  
         (COUNT(*) > 1)) AS dt ON a.DocumentName = dt.DocumentName 
                               AND a.SubmitDateTime = dt.SubmitDateTime 
LEFT OUTER JOIN
    Staging b ON a.DocumentId = b.DocumentId
WHERE 
    b.DocumentId IS NULL  
    AND a.SubmitDateTime IS NOT NULL
    AND a.InsertDateTime IS NOT NULL
ORDER BY 
    a.SubmitDateTime

Sample data

DocumentNameSubmitDateTime
SampleOne2016-10-14 12:44:39.460
SampleOne2016-10-14 12:44:39.460
SampleTwo2016-10-14 12:44:39.460
SampleTwo2016-10-14 12:44:39.460
SampleThree2016-10-14 12:45:32.813
SampleFive2016-10-14 12:46:23.423
Doc Name2016-10-17 15:15:05.523
Doc Name 22016-10-17 15:15:32.170
Doc Name2016-10-17 15:17:32.153
Doc Name2016-10-17 15:17:53.963
Doc Name2016-10-17 15:18:15.977
Document Text2016-10-17 15:19:22.950
Document Text2016-10-17 15:22:25.463

I have managed to get the results that I want by using the query below and needs suggestion if I am on the right track.

SELECT    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.DocumentName, s.SubmitDateTime, s.DocumentType) AS DocCount
    FROM Document s
    LEFT OUTER JOIN
     Staging b ON S.DocumentId = b.DocumentId
     WHERE 
     b.DocumentId IS NULL 
     AND s.SubmitDateTime IS NOT NULL
     AND s.InsertDateTime IS NOT NULL
     AND s.DocumentName IS NOT NULL
    ) t
    
WHERE t.DocCount> 1
  
ORDER BY t.DocumentName, t.SubmitDateTime
zyfwsgd6

zyfwsgd61#

Since you didn't provide any clear input or output I am assuming the following:

The subquery t will select only the necessary columns DocumentName and SubmitDateTime .

I replaced the LEFT OUTER JOIN with Staging with a LEFT JOIN for simplicity, assuming you don't require columns from the Staging table in the final result. In the COUNT(*) function I included the OVER PARTITION to count duplicates only based on DocumentName and SubmitDateTime .

SELECT t.DocumentName, t.SubmitDateTime
FROM (
    SELECT s.DocumentName, s.SubmitDateTime, COUNT(*) OVER (PARTITION BY s.DocumentName, s.SubmitDateTime) AS DocCount
    FROM Document s
    LEFT JOIN Staging b ON s.DocumentId = b.DocumentId
    WHERE b.DocumentId IS NULL
        AND s.SubmitDateTime IS NOT NULL
        AND s.InsertDateTime IS NOT NULL
        AND s.DocumentName IS NOT NULL
) t
WHERE t.DocCount > 1
ORDER BY t.DocumentName, t.SubmitDateTime;

相关问题