什么时候我应该在hadoop中使用outputcollector和context？

2skhul33 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(393)

在本文中，我找到了用于字数统计的Map器代码：

public static class MapClass extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, 
                    OutputCollector<Text, IntWritable> output, 
                    Reporter reporter) throws IOException {
      String line = value.toString();
      StringTokenizer itr = new StringTokenizer(line);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(word, one);
      }
    }
  }

相反，在官方教程中，这是提供的Map器：

public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

到目前为止，我只看到 Context 我从未见过（或使用过）从Map器到减缩器写东西 OutputCollector . 我已经阅读了文档，但是我不明白它的用法或者为什么要使用它。

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/46623041/when-should-i-use-outputcollector-and-context-in-hadoop

2条答案

按热度按时间

yquaqz181#

这是一个很好的解决方案，但是，我只使用1行解决方案，即：int wordcount=string.split（“”）.length-1；

赞(0）回复(0）举报 2021-06-02

a6b3iqyw2#

两个代码都包含不同的map reduce api。 OutputCollector 在mrv1和 Context 在mrv2中
javamapreduceapi1（也称为mrv1）是与hadoop的初始版本一起发布的，与这些初始版本相关的缺陷是mapreduceframework同时执行处理和资源管理任务。
mapreduce2或下一代mapreduce2，是对hadoop中与调度、资源管理和执行相关的技术的一个期待已久且急需的升级。从根本上说，这些改进将集群资源管理功能从map reduce特定逻辑中分离出来，这种处理和资源管理的分离是通过hadoop更高版本中的yarn实现的。
mrv1使用 OutputCollecter 以及 Reporter 与mapreduce系统通信。
mrv2使用api广泛使用 context 允许用户代码与mapreduce系统通信的对象(旧api中jobconf、outputcollector和reporter的角色由mrv2中的contexts对象统一。
使用mapreduce2（mrv2）。我强调了hadoop2相对于hadoop的最大优势：
一个主要的优点是，hadoop2体系结构中没有jobtracker和tasktracker。我们有Yarn资源管理器和节点管理器代替。这有助于hadoop2支持mapreduce框架之外的其他模型来执行代码并克服与mapreduce相关的高延迟问题。
hadoop2支持非批处理和传统的批处理操作。
hadoop2中引入了hdfs联邦。这使得多个namenodes能够控制hadoop集群，该集群试图处理hadoop的单点故障问题。
mrv2还有很多优点。https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/

赞(0）回复(0）举报 2021-06-02

我来回答

什么时候我应该在hadoop中使用outputcollector和context？

2条答案

相关问题

热门标签

最新问答