mapreduce—在一个文本文件上执行多个操作，但将其作为一个hadoop作业执行的正确方法是什么

xu3bshqb 于 2021-06-01 发布在 Hadoop

关注(0)|答案(1)|浏览(243)

我想在一个文本文件上执行一些操作。
例如：任务1：数一数所有的单词
任务2：计算以特定字符结尾的单词
任务3：数数出现多次的单词。
实现这一目标的最佳方法是什么？
我需要写多个Map器和多个还原器吗？多个Map器和单个缩减器？或者我们可以用一个Map器和缩小器
如果有人能提供一个编程示例，那就太好了。

hadoop mapreduce hadoop2 reducers Mapper

来源：https://stackoverflow.com/questions/49307550/what-is-the-correct-way-to-perform-multiple-operations-on-a-text-file-but-exec

1条答案

按热度按时间

ajsxfq5m1#

使用计数器计算您要查找的内容。mapreduce完成后，只需获取驱动程序类中的计数器。
e、 g.可以在Map器中计算字数和以“z”或“z”开头的字数

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
            if(hasKey.toUpperCase().startsWith("Z")){
            context.getCounter("my_counters", "Z_WORDS").increment(1);
            }
            context.write(word, one);
        }
    }
}

不同单词的数量和 words appearing less than 4 times 可在减速机计数器中计数。

public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int wordCount= 0;
        context.getCounter("my_counters", "DISTINCT_WORDS").increment(1);
        for (IntWritable val : values){
            wordCount += val.get();
        }
        if(wordCount < 4{
           context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1);
        }
    }
}

在driver类中获取计数器。下面的代码在您提交作业的行之后

CounterGroup group = job.getCounters().getGroup("my_counters");

for (Counter counter : group) {
   System.out.println(counter.getName() + "=" + counter.getValue());
}

赞(0）回复(0）举报 2021-06-01

我来回答

mapreduce—在一个文本文件上执行多个操作，但将其作为一个hadoop作业执行的正确方法是什么

1条答案

相关问题

热门标签

最新问答