hadoop-在两个客户列表中查找匹配的名称

flvlnr44 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(412)

我有两份来自不同事件的人员名单；我想在这些名单中寻找匹配的人的名字，以及匹配的公司。我知道每个列表中可能会有名字相同的人不是同一个人，但这将有助于找到匹配的人。
第一个列表示例：
姓名、公司、职务
无名氏，acme公司，驯象师
jane smith，acme corporation，首席执行官
约翰·史密斯，小工具公司，看门人
+10000行
第二个列表示例：
姓名、公司
fred smith，acme公司
约翰·史密斯，美国
约翰·史密斯，xyz公司
简·史密斯，xyz公司
+10000行
期望输出
匹配名称：
史密斯
简史密斯
匹配公司：
acme公司
小工具-r-us
我是在aws环境下运行它的，对hadoop还是个新手。任何编程语言都可以。我知道如何在excel中做到这一点，但希望能够随着时间的推移，用更多的名字列表（每个名字都在自己的csv文件中）来扩展它。
谢谢你的好意！

hadoop merge Match multifile

来源：https://stackoverflow.com/questions/16532936/hadoop-look-for-matching-names-in-two-customer-lists

1条答案

按热度按时间

sigwle7e1#

您需要一个Map器实现，在该实现中，您以文本和intwritable的形式发出名称和公司名称。 protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ /*Some logic to derive the person name or the Company name.*/ String name = value.split(',')[0]; context.write(new Text(value),new IntWritable(1)); } reducer中reduce方法的实现类似于 public void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException{ int count = 1; for(IntWritable val: values){count++;} //You would all the unique names with no of times it is repeated. context.write(key,new IntWritable(count)); } 希望这有帮助。

赞(0）回复(0）举报 2021-06-03

我来回答

hadoop-在两个客户列表中查找匹配的名称

1条答案

相关问题

热门标签

最新问答