改进wordcount中的身份Map器

lymnna71  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(275)

我创建了一个map方法来读取wordcount示例[1]的map输出。这个例子不使用 IdentityMapper.class mapreduce提供了,但这是我找到的唯一一种工作方式 IdentityMapper 为了字数。唯一的问题是,这个Map器花费的时间比我想要的多得多。我开始觉得也许我在做一些多余的事情。有什么能帮我改进的吗 WordCountIdentityMapper 代码?
[1] 身份Map器

public class WordCountIdentityMapper extends MyMapper<LongWritable, Text, Text, IntWritable> {
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        word.set(itr.nextToken());
        Integer val = Integer.valueOf(itr.nextToken());
        context.write(word, new IntWritable(val));
    }

    public void run(Context context) throws IOException, InterruptedException {
        while (context.nextKeyValue()) {
            map(context.getCurrentKey(), context.getCurrentValue(), context);
        }
    }
}

[2] 生成mapoutput的map类

public static class MyMap extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());

        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }

    public void run(Context context) throws IOException, InterruptedException {
        try {
            while (context.nextKeyValue()) {
                map(context.getCurrentKey(), context.getCurrentValue(), context);
            }
        } finally {
            cleanup(context);
        }
    }
}

谢谢,

arknldoa

arknldoa1#

解决办法是更换 StringTokenizerindexOf() 方法。效果更好。我有更好的表现。

相关问题