我有一个5gb的文件。我正在运行一个简单的wordcount map reduce作业。块大小为128 mb。它是一个单节点集群。在阅读了120万份报告之后。它再次从同一个文件的开头开始读取。sudo代码如下。
Configuration objconf = new Configuration()
Path objInputPath = new Path("/home/abc/Desktop/Debug.csv")
Path objoutPath = new Path("/home/abc/Desktop/Outpath.csv")
Job objJob = new Job(objconf, "WordCount")
FileInputFormat.setInputPaths(objJob, objInputPath)
FileOutputFormat.setOutputPath(objJob, objoutPath)
objJob.setJarByClass(WordCount.class)
objJob.setMapperClass(WCMapper.class)
objJob.setJobName("WordCount")
objJob.setInputFormatClass(TextInputFormat.class)
objJob.setOutputFormatClass(TextOutputFormat.class)
int j = objJob.waitForCompletion(true) ? 0 : 1
Mapper.java
private IntWritable one = new IntWritable(1)
private Text word = new Text()
String line = value.toString()
暂无答案!
目前还没有任何答案,快来回答吧!