Create an empty hash set for the intersection with a hash function that doesn't collide over email addresses
Create an empty hash set for the first difference hash set with a similar hash function
Create an empty hash set for the second difference hash set with a similar hash function
Iterate through the first list:
Add the current element to the first difference hash set
End Iterate
Iterate through the second list:
If the current element exists in the intersection hash set:
Remove the current element from the first difference hash set
Remove the current element from the second difference hash set
Else If the current element exists in the first difference hash set:
Remove the current element from the first difference hash set
Remove the current element from the second difference hash set
Add the current element to the intersection hash set
Else:
Add the current element to the second difference hash set
End If
End Iterate
Process the intersection hash set as the solution
//first two lines are just for testing, not part of the algorithm:
List<String> l1 = Arrays.asList(new String[] { "a@b.com", "1@2.com"} );
List<String> l2 = Arrays.asList(new String[] { "1@2.com", "asd@f.com", "qwer@ty.com"} );
Set<String> s1 = new HashSet<String>(l1);
for (String s : l2) {
if (s1.contains(s)) System.out.println(s);
}
如果您想使用hadoop,可以通过以下方式实现常见邮件:
map(set):
for each mail in list:
emit(mail,'1')
reduce(mail,list<1>):
if size(list) > 1:
emit(mail)
2条答案
按热度按时间6ovsh4lw1#
这对你有用吗?应该是o(n)。
它的好处是既给你交集又给你区别。它可以扩展到跟踪任意数量的列表之间的差异。
bvuwiixz2#
如果将每个列表视为一个集合,则公共地址由集合交集表示。“唯一”地址(仅出现在一个地址中)表示为:
在所有高级语言(如java)中都可以很容易地完成,看看apache吧
CollectionUtils.intersection()
例如。如果列表不是太大(适合内存),可以在内存中执行以下操作(java代码):
如果您想使用hadoop,可以通过以下方式实现常见邮件:
通过在两个集合上调用map,并在mapper的输出上进行reduce,您将获得公共元素。