I am trying to find duplicate records by name and the date submitted.
For example, I want to get all records with same name of "SampleOne" and same submitted date of "2016-10-14 12:44 39 460" and exclude any data that doesn't match.
I am using the query shown below, but I'm getting other data too. From my screenshot below I need only rows 1- 4 and not show the ones that doesn't have the same name and same submitDate
.
SELECT TOP (13)
a.DocumentName, a.SubmitDateTime
FROM
Document AS a
INNER JOIN
(SELECT
DocumentName, SubmitDateTime, COUNT(*) AS DocCount
FROM
Document
GROUP BY
DocumentName, SubmitDateTime
HAVING
(COUNT(*) > 1)) AS dt ON a.DocumentName = dt.DocumentName
AND a.SubmitDateTime = dt.SubmitDateTime
LEFT OUTER JOIN
Staging b ON a.DocumentId = b.DocumentId
WHERE
b.DocumentId IS NULL
AND a.SubmitDateTime IS NOT NULL
AND a.InsertDateTime IS NOT NULL
ORDER BY
a.SubmitDateTime
Sample data
DocumentName | SubmitDateTime |
---|---|
SampleOne | 2016-10-14 12:44:39.460 |
SampleOne | 2016-10-14 12:44:39.460 |
SampleTwo | 2016-10-14 12:44:39.460 |
SampleTwo | 2016-10-14 12:44:39.460 |
SampleThree | 2016-10-14 12:45:32.813 |
SampleFive | 2016-10-14 12:46:23.423 |
Doc Name | 2016-10-17 15:15:05.523 |
Doc Name 2 | 2016-10-17 15:15:32.170 |
Doc Name | 2016-10-17 15:17:32.153 |
Doc Name | 2016-10-17 15:17:53.963 |
Doc Name | 2016-10-17 15:18:15.977 |
Document Text | 2016-10-17 15:19:22.950 |
Document Text | 2016-10-17 15:22:25.463 |
I have managed to get the results that I want by using the query below and needs suggestion if I am on the right track.
SELECT t.*
FROM (
SELECT
s.*
, COUNT(*) OVER (PARTITION BY s.DocumentName, s.SubmitDateTime, s.DocumentType) AS DocCount
FROM Document s
LEFT OUTER JOIN
Staging b ON S.DocumentId = b.DocumentId
WHERE
b.DocumentId IS NULL
AND s.SubmitDateTime IS NOT NULL
AND s.InsertDateTime IS NOT NULL
AND s.DocumentName IS NOT NULL
) t
WHERE t.DocCount> 1
ORDER BY t.DocumentName, t.SubmitDateTime
1条答案
按热度按时间zyfwsgd61#
Since you didn't provide any clear input or output I am assuming the following:
The subquery
t
will select only the necessary columnsDocumentName
andSubmitDateTime
.I replaced the
LEFT OUTER JOIN
withStaging
with aLEFT JOIN
for simplicity, assuming you don't require columns from theStaging
table in the final result. In theCOUNT(*)
function I included the OVER PARTITION to count duplicates only based onDocumentName
andSubmitDateTime
.