如何使用mapreduce程序获取输入文件中的最后一个单词计数

14ifxucb 于 2021-05-29 发布在 Hadoop

关注(0)|答案(2)|浏览(320)

任何人都可以告诉什么修改需要做一个简单的字计数程序，以获得最后一个字计数从一个文件使用Map减少。
如果输入文件是

hai hello world
hello world java
hadoop world hai
hello hai java

Expected o/p : world 3

因为“世界”将是排序后的最后一个键。
谢谢你的帮助

hadoop mapreduce

来源：https://stackoverflow.com/questions/32581953/how-to-get-last-word-count-in-a-input-file-using-mapreduce-programme

2条答案

按热度按时间

pzfprimi1#

将“减速器数”设置为1。在map-side中，重写默认的排序方法以按降序排序，并在驱动程序代码中设置compartor类 job.setSortComparatorClass. 只从reduce调用中获取第一个键value。

public class MysortComparator extends WritableComparator
{
    protected MysortComparator()
    {
        super(Text.class,true);
    }
    @SuppressWarnings("rawtypes")
    public int compare(WritableComparable w,WritableComparable w1)
    {
        Text s=(Text)w;
        Text s1=(Text)w1;
        return -1 * s.compareTo(s1);
}

您还可以覆盖reducer的run方法，只读取第一条记录并将其传递给reduce调用，而忽略其他记录。如果您的单个reducer需要大的键/值对，那么这将避免开销。

public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  int rec_cnt = 0;
  while (context.nextKey() && rec_cnt++ < 1) {
    reduce(context.getCurrentKey(), context.getValues(), context);
  }
  cleanup(context);
}

赞(0）回复(0）举报 2021-05-30

af7jpaap2#

One simple way available. 不需要显式排序。
假设你有 one reducer 跑步。您可以覆盖 cleanup() 方法。
reducer中使用cleanup（）方法在reduce任务结束时执行内部维护活动。
但你可以利用它。因为cleanup（）方法只在reduce任务之后执行一次。 By the end of your reduce task you will be holding only last key-value pair. Now, instead of emiting that output from reduce() method emit it from cleanup() method. 只能将context.write（）保存在cleanup（）中。

@Override
protected void cleanup(Context context){

    context.write(//keep your key-values here);
}

我相信这样做你的工作毫不费力，你会得到所需的结果立即使用上述3行代码。

赞(0）回复(0）举报 2021-05-30

我来回答

如何使用mapreduce程序获取输入文件中的最后一个单词计数

2条答案

相关问题

热门标签

最新问答