map->reduce->reduce(两个reducer依次调用)-如何配置驱动程序

yacmzcpb 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(401)

我需要编写一个map-reduce程序，连续调用两个reducer。即，第一个减速机的输出将作为第二个减速机的输入。我如何做到这一点？
到目前为止，我发现需要在驱动程序代码（下面的代码）中配置两个map reduce作业。
这看起来很浪费，原因有二-
我真的不需要一个Map绘制在第二份工作
有两份工作看起来太过分了。
有没有更好的方法来实现这一点？
此外，还有一个关于以下方法的问题：job1的输出将是output\u path目录中的多个文件。这个目录作为job2的输入传入，可以吗？它不一定是一个文件吗？job2会处理给定目录下的所有文件吗？

Configuration conf = getConf();
  FileSystem fs = FileSystem.get(conf);
  Job job = new Job(conf, "Job1");
  job.setJarByClass(ChainJobs.class);

  job.setMapperClass(MyMapper1.class);
  job.setReducerClass(MyReducer1.class);

  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);

  job.setInputFormatClass(TextInputFormat.class);
  job.setOutputFormatClass(TextOutputFormat.class);

  TextInputFormat.addInputPath(job, new Path(args[0]));
  TextOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));

  job.waitForCompletion(true); /*this goes to next command after this job is completed. your second job is dependent on your first job.*/

  /*
   * Job 2
   */
  Configuration conf2 = getConf();
  Job job2 = new Job(conf2, "Job 2");
  job2.setJarByClass(ChainJobs.class);

  job2.setMapperClass(MyMapper2.class);
  job2.setReducerClass(MyReducer2.class);

  job2.setOutputKeyClass(Text.class);
  job2.setOutputValueClass(Text.class);

  job2.setInputFormatClass(TextInputFormat.class);
  job2.setOutputFormatClass(TextOutputFormat.class);

  TextInputFormat.addInputPath(job2, new Path(OUTPUT_PATH));
  TextOutputFormat.setOutputPath(job2, new Path(args[1]));

  return job2.waitForCompletion(true) ? 0 : 1;

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/60581691/map-reduce-reduce-two-reducers-to-be-called-sequentially-how-to-config