reducer多次接收相同的值，而不是预期的输入

yhxst69z 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(426)

在本地hadoop环境中编写map reduce作业时，遇到了一个问题：reducer没有收到我期望的值。我把问题归结为以下几点：
我创建了一个包含10行的任意输入文件，让map方法执行10次。在Map器中，我创建一个调用计数并将该计数作为值写入输出，如果值为偶数，则0作为键；如果值为奇数，则1作为键，即以下（键，值）对：
（1,1）、（0,2）、（1,3）、（0,4）、（1,5）等。
我希望能接到两个电话给减速机
0 > [2,4,6,8,10]
1 > [1,3,5,7,9]
但我接到两个电话
0 > [2,2,2,2,2]
1 > [1,1,1,1,1]
相反。似乎我收到了Map器中用键的多重性写入的第一个值（如果我反转计数器，我会收到值10和9，而不是2和1）。据我所知，这不是预期的行为（？），但我不知道我做错了什么。
我使用以下Map器和还原器：

public class TestMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
    int count = 0;

    @Override
    protected void map(LongWritable keyUnused, Text valueUnused, Context context) throws IOException, InterruptedException {
        count += 1;
        context.write(new IntWritable(count % 2), new IntWritable(count));

        System.err.println((count % 2) + "|" + count);
    }
}

public class TestReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable>{
    @Override
    protected void reduce(IntWritable key, Iterable<IntWritable> valueItr, Context context) throws IOException, InterruptedException {
        List<IntWritable> values = Lists.newArrayList(valueItr);

        System.err.println(key + "|" + values);
    }
}

我使用本地测试运行程序运行hadoop作业，如《hadoop：权威指南》（o'reilly）一书中所述：

public class TestDriver extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.printf("Usage: %s [generic options] <input> <output>\n",
                    getClass().getSimpleName());
            ToolRunner.printGenericCommandUsage(System.err);
            return -1;
        }

        Job jobConf = Job.getInstance(getConf());
        jobConf.setJarByClass(getClass());
        jobConf.setJobName("TestJob");  

        jobConf.setMapperClass(TestMapper.class);
        jobConf.setReducerClass(TestReducer.class);

        FileInputFormat.addInputPath(jobConf, new Path(args[0]));
        FileOutputFormat.setOutputPath(jobConf, new Path(args[1]));

        jobConf.setOutputKeyClass(IntWritable.class);
        jobConf.setOutputValueClass(IntWritable.class);

        return jobConf.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        System.exit(ToolRunner.run(new TestDriver(), args));
}

打包在一个jar中并使用“hadoop jar test.jar infle.txt/tmp/testout”运行。

hadoop mapreduce

来源：https://stackoverflow.com/questions/27172371/reducer-receives-identical-value-multiple-times-instead-of-expected-input

1条答案

按热度按时间

dohp0rv51#

hadoop在流化reducer值时重用value对象。
因此，为了捕获所有不同的值，您需要复制：

@Override
protected void reduce(IntWritable key, Iterable<IntWritable> valueItr, Context context) throws  IOException, InterruptedException {        
    List<IntWritable> values = Lists.newArrayList();
    for(IntWritable writable : valueItr) {
        values.add(new IntWritable(writable.get());
    }

    System.err.println(key + "|" + values);
}

赞(0）回复(0）举报 2021-05-30

我来回答

reducer多次接收相同的值，而不是预期的输入

1条答案

相关问题

热门标签

最新问答