使用LINQ获取两列的重复项

rqenqsqc  于 2022-12-06  发布在  其他
关注(0)|答案(2)|浏览(132)

LINQ让我抓狂。为什么下面的查询不返回重复项,而它只处理一个标识符?我的错误在哪里?

' generate some test-data '
Dim source As New DataTable
source.Columns.Add(New DataColumn("RowNumber", GetType(Int32)))
source.Columns.Add(New DataColumn("Value1", GetType(Int32)))
source.Columns.Add(New DataColumn("Value2", GetType(Int32)))
source.Columns.Add(New DataColumn("Text", GetType(String)))
Dim rnd As New Random()
For i As Int32 = 1 To 100
    Dim newRow = source.NewRow
    Dim value = rnd.Next(1, 20)
    newRow("RowNumber") = i
    newRow("Value1") = value
    newRow("Value2") = (value + 1)
    newRow("Text") = String.Format("RowNumber{0}-Text", i)
    source.Rows.Add(newRow)
Next
' following query does not work, it always has Count=0 '
' although it works with only one identifier '
Dim dupIdentifiers = From row In source
         Group row By grp = New With {.Val1 = row("Value1"), .Val2 = row("Value2")}
         Into Group
         Where Group.Count > 1
         Select idGroup = New With {grp.Val1, grp.Val2, Group.Count}

编辑:以下是完整的解决方案,感谢@Jon Skeet的回答:)

Dim dupKeys = From row In source
        Group row By grp = New With {Key .Val1 = CInt(row("Value1")), Key .Val2 = CInt(row("Value2"))}
        Into Group Where Group.Count > 1
        Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

Dim dupRows = From row In source
        Join dupKey In dupKeys 
        On row("RowNumber") Equals dupKey 
        Select row

If dupRows.Any Then
    ' create a new DataTable from the first duplicate rows '
    Dim dest = dupRows.CopyToDataTable
End If

分组的主要问题是我必须使它们成为key属性。(显示两个字段)、结果DataTable包含100行中的99行,而不仅仅是19个重复值。我只需要选择第一个重复行,并将它们与PK上的原始表连接。

Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

虽然这在我的例子中是有效的,但是如果我只有组合键,也许有人可以解释一下如何从原始表中只选择重复项。

编辑:我已经回答了问题的最后一部分,所以这里是我所需要的:

Dim dups = From row In source
         Group By grp = New With {Key .Value1 = CInt(row("Value1")), Key .Value2 = CInt(row("Value2"))}
         Into Group Where Group.Count > 1
         Let Text = Group.First.Item("Text")
         Select Group.First

If dups.Any Then
      Dim dest = dups.CopyToDataTable
End If

我需要Let-Keyword来将其他列保持在相同的上下文中,并且只返回分组副本的第一行。这样,我就可以使用CopyToDataTable从重复的行创建一个DataTable。
总体上只需要几行代码(我可以保存第二个查询来查找原始表中的行)就可以查找多个列上的重复项并创建它们的DataTable。

vx6bjr1n

vx6bjr1n1#

问题是anonymous types work in VB的方式--它们在默认情况下是可变的;仅包含Key属性用于哈希和相等。请尝试以下操作:

Group row By grp = New With {Key .Val1 = row("Value1"), Key .Val2 = row("Value2")}

(In C#中,这不会是一个问题--C#中的匿名类型在所有属性中始终是不可变的。)

nx7onnlm

nx7onnlm2#

  • 我使用Lin-q和C Sharp在EF表的两列中获取重复行,使其显示为重复:*
var DuplicatesFoundInTable =
            entities.LocationDatas
           .Where(c => c.TrailerNumber != null && c.CarrierName != null && (c.TrailerNumber ?? string.Empty) != string.Empty && (c.CarrierName ?? string.Empty) != string.Empty)
           .GroupBy(o => new { o.TrailerNumber, o.CarrierName }, l => new { customer.TrailerNumber, customer.CarrierName })
           .Where(g => g.Count() > 1)
           .Select(y => y.Key)
           .ToList();
  • 当我想查看它是否是输入上的重复项时(如果该条目已存在于两列中):*
//Check to see if any rows are the same values on TrailerNumber and CarrierName for inputs. 
            bool AlreadyInTableComparer = entities.LocationDatas.Any(l => String.Compare(l.TrailerNumber, customer.TrailerNumber, StringComparison.InvariantCulture) == 0 && String.Compare(l.CarrierName, customer.CarrierName, StringComparison.InvariantCulture) == 0);
            bool AlreadyInTable = entities.LocationDatas.Any(t => t.TrailerNumber == customer.TrailerNumber && t.CarrierName == customer.CarrierName);
  • SQL Server正在检查重复项(注解掉删除重复项):*
WITH CTE
 AS
  (
 SELECT [TrailerNumber], [CarrierName]
 ,ROW_NUMBER() OVER(Partition BY TrailerNumber Order by TrailerNumber, 
    CarrierName) AS NumRows, ROW_NUMBER() OVER(Partition BY TrailerNumber, 
   CarrierName Order by CarrierName) AS NumRows2
   FROM [dbo].[LocationData] --Please note, duplicates are shown in this 
   table.
   WHERE  TrailerNumber != '' AND CarrierName != '' 
   )
   SELECT [TrailerNumber], [CarrierName], [NumRows2] FROM CTE WHERE NumRows2 > 1
   --DELETE FROM CTE WHERE NumRows2 > 1  --Delete Duplicates.
  • 验证SQL Server以证明CTE筛选的正确性:*
SELECT TrailerNumber, CarrierName, COUNT(*) AS Duplicates
    FROM [dbo].[LocationData]
    WHERE TrailerNumber IS NOT NULL OR CarrierName IS NOT NULL 
    GROUP BY TrailerNumber, CarrierName
    HAVING COUNT(*) >1 AND TrailerNumber != '' AND CarrierName != ''

相关问题