是否可以执行独立的map reduce作业(而不是在reducer输出的链接中)成为Map器的输入。可以一个接一个地执行。
57hvy0tb1#
在驱动程序代码中调用两个方法runfirstjob,runsecondjob。就像这样。这只是一个提示,根据需要进行修改
public class ExerciseDriver { static Configuration conf; public static void main(String[] args) throws Exception { ExerciseDriver ED = new ExerciseDriver(); conf = new Configuration(); FileSystem fs = FileSystem.get(conf); if(args.length < 4) { System.out.println("Too few arguments. Arguments should be: <hdfs input folder> <hdfs output folder> <N configurable Integer Value>"); System.exit(0); } String pathin1stmr = args[0]; String pathout1stmr = args[1]; String pathin2ndmr = args[2]; String pathout2ndmr = args[3]; ED.runFirstJob(pathin1stmr, pathout1stmr); ED.runSecondJob(pathin2ndmr, pathout2ndmr); } public int runFirstJob(String pathin, String pathout) throws Exception { Job job = new Job(conf); job.setJarByClass(ExerciseDriver.class); job.setMapperClass(ExerciseMapper1.class); job.setCombinerClass(ExerciseCombiner.class); job.setReducerClass(ExerciseReducer1.class); job.setInputFormatClass(ParagrapghInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(pathin)); FileOutputFormat.setOutputPath(job, new Path(pathout)); job.submit(); job.getMaxMapAttempts(); /* JobContextImpl jc = new JobContextImpl(); TaskReport[] maps = jobclient.getMapTaskReports(job.getJobID()); */ boolean success = job.waitForCompletion(true); return success ? 0 : -1; } public int runSecondJob(String pathin, String pathout) throws Exception { Job job = new Job(conf); job.setJarByClass(ExerciseDriver.class); job.setMapperClass(ExerciseMapper2.class); job.setReducerClass(ExerciseReducer2.class); job.setInputFormatClass(KeyValueTextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job,new Path(pathin)); FileOutputFormat.setOutputPath(job, new Path(pathout)); boolean success = job.waitForCompletion(true); return success ? 0 : -1; } }
2vuwiymt2#
You can go with Parallel job running. Sample code is given below Configuration conf = new Configuration(); Path Job1InputDir = new Path(args[0]); Path Job2InputDir = new Path(args[1]); Path Job1OutputDir = new Path(args[2]); Path Job2OutputDir = new Path(args[3]); Job Job1= submitJob(conf, Job1InputDir , Job1OutputDir ); Job Job2= submitJob(conf, Job2InputDir , Job2OutputDir ); // While both jobs are not finished, sleep while (!Job1.isComplete() || !Job2.isComplete()) { Thread.sleep(5000); } if (Job1.isSuccessful()) { System.out.println(" job1 completed successfully!"); } else { System.out.println(" job1 failed!"); } if (Job2.isSuccessful()) { System.out.println("Job2 completed successfully!"); } else { System.out.println("Job2 failed!"); } System.exit(Job1.isSuccessful() && Job2.isSuccessful() ? 0 : 1); }
tpxzln5u3#
如果您想一个接一个地执行,那么您可以按照下面的链接链接作业:http://unmeshasreeveni.blogspot.in/2014/04/chaining-jobs-in-hadoop-mapreduce.html
3条答案
按热度按时间57hvy0tb1#
在驱动程序代码中调用两个方法runfirstjob,runsecondjob。就像这样。这只是一个提示,根据需要进行修改
2vuwiymt2#
tpxzln5u3#
如果您想一个接一个地执行,那么您可以按照下面的链接链接作业:
http://unmeshasreeveni.blogspot.in/2014/04/chaining-jobs-in-hadoop-mapreduce.html