mapper在本地和集群上的行为不同

l5tcr1uw 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(358)

我运行了一个map-only作业（在hadoop上）来对键值进行排序，因为上面说“hadoop会在将Map器发出的数据发送到reducer之前自动排序”。

输入文件

2013-04-15      835352
2013-04-16      846299
2013-04-17      828286
2013-04-18      747767
2013-04-19      807924

我认为map（第二个\u cloumn，第一个\u列）应该对这个文件进行排序，如output1所示。当我在本地机器上运行这个程序时，它确实做到了。但是当我在集群上运行它时，输出如output2所示。

输出1文件

747767  2013-04-18
807924  2013-04-19
828286  2013-04-17
835352  2013-04-15
846299  2013-04-16

输出2文件

835352  2013-04-15
747767  2013-04-18
807924  2013-04-19
828286  2013-04-17
846299  2013-04-16

我怎样才能保证它总是像输出一样。我愿意接受另一个建议，按第二栏排序。

制图器

public class MapAccessTime1 extends Mapper<LongWritable, Text, IntWritable, Text> {

    private IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String line = value.toString();
        int val = 0;
        StringTokenizer tokenizer = new StringTokenizer(line);
        if (!line.startsWith("#")) {
            if (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
            }
            if (tokenizer.hasMoreTokens()) {
                val = Integer.parseInt(tokenizer.nextToken());
                one = new IntWritable(val);
                context.write(one, word);
            }
        }
    }
}

hadoop mapreduce Mapper

来源：https://stackoverflow.com/questions/18054619/mapper-behaving-differently-on-local-and-on-the-cluster