java—如何过滤2个巨大的列表，其中包含数百万个具有相同id的项

7vhp5slm 于 2021-07-03 发布在 Java

关注(0)|答案(2)|浏览(319)

这个问题在这里已经有答案了：

java流-从其他两个列表中获取“对称差异列表”（6个答案）
java 8按属性区分（29个答案）
从基于Java8中属性的对象列表中删除重复项[重复]（9）
上个月关门了。
这是我的2个清单，上面有数百万个项目。两者具有相同id的相同项。id在字符串中。我只需要不一样的项目id。我这样做。但我相信一定有更好的解决办法，而且有很高的permanence:-

List<Transaction> differentList = new ArrayList<>();

    for(Transaction tx : foundTransactions ){
        for(Transaction aTx : ArchivedTransactions) 
        {
            if(!tx.getId().equalsIgnoreCase(aTx.getId()) ){
                differentList .add(tx);
            }
        }
    }

我试着使用流，但我做不到。我想用流api应该更好。请给我一些改进建议。

Java List java-stream

来源：https://stackoverflow.com/questions/65143461/how-to-filter-2-huge-list-with-millions-of-item-in-it-with-same-id

2条答案

按热度按时间

nbnkbykc1#

我想到的最简单的解决办法是使用 Set 它会自动丢弃重复的元素。

Set<Transaction> result = new LinkedHashSet<>();
result.addAll(foundTransactions);
result.addAll(ArchivedTransactions);

//If you want to get a List<Transaction>
List<Transaction> differentList = new ArrayList<>(result);

注意：我用过 LinkedHashSet 保留插入顺序。如果插入顺序对您不重要，您可以使用 HashSet .

赞(0）回复(0）举报 2021-07-03

vbopmzt12#

你可以试着把它转换成 HashMap 首先，类似于：

Set<String> collect = ArchivedTransactions.stream().map(i -> i.getId().toLowerCase())
                                           .collect(Collectors.toSet());

for(Transaction tx : foundTransactions )
    if(!collect.contains(tx.getId()))
       differentList.add(tx);

这个 Collectors.toSet() 返回一个 HashSet . 您可以将代码简化为：

Set<String> collect = ArchivedTransactions.stream().map(i -> i.getId().toLowerCase())
                                          .collect(Collectors.toSet());

List<Transaction> differentList = foundTransactions.stream()
                                                   .filter(tx -> !collect.contains(tx.getId()))
                                                   .collect(Collectors.toList())

添加 IDs 首先进入一个 HashSet 作为中间步骤，将为您提供更好的总体复杂性时间，因为（来源）：
hashset操作的时间复杂度：hashset的底层数据结构是hashtable。所以hashset的add、remove和查找（contains method）操作的摊销（平均或通常情况）时间复杂度需要o（1）个时间。
因此 time complexity 的 "HashMap" 解决方案是 O(N + M) ，在哪里 N 以及 M 从列表中的元素数开始 ArchivedTransactions 以及 foundTransactions 分别是。尽管如此， space-wise 你将为拥有这个额外的结构付出代价。
你的解决方案 space-wise 更好，但时间复杂度最差。如果 N = M 解决方案的时间复杂性是 O(N^2) ，而 HashSet 会是 O(2N) ，因此 O(N) . 这是一个巨大的差异。
只是

Set<Transaction> result = new LinkedHashSet<>();
result.addAll(foundTransactions);
result.addAll(ArchivedTransactions);

因为您明确要求：

!tx.getId().equalsIgnoreCase(aTx.getId())

赞(0）回复(0）举报 2021-07-03

我来回答

java—如何过滤2个巨大的列表，其中包含数百万个具有相同id的项

2条答案

相关问题

热门标签

最新问答