我是hadoop新手,遇到过这个问题。我正在尝试将默认文本、整型值的大小写改为文本、文本。我想Map文本,intwritable,然后在reducer中我想有2个计数器,这取决于值是什么,然后将这2个计数器写入收集器的文本中。
public class WordCountMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException {
String line = value.toString();
String[] words = line.split(",");
String[] date = words[2].split(" ");
word.set(date[0]+" "+date[1]+" "+date[2]);
if(words[0].contains("0"))
one.set(0);
else
one.set(4);
output.collect(word, one);
}
}
-----------------------------------------------------------------------------------
public class WordCountReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text, Text> {
public void reduce(Text key,Iterator<IntWritable> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
int sad = 0;
int happy = 0;
while (values.hasNext()) {
IntWritable value = (IntWritable) values.next();
if(value.get() == 0)
sad++; // process value
else
happy++;
}
output.collect(key, new Text("sad:"+sad+", happy:"+happy));
}
}
---------------------------------------------------------------------------------
public class WordCount {
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(WordCount.class);
// specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// specify input and output dirs
FileInputFormat.addInputPath(conf, new Path("input"));
FileOutputFormat.setOutputPath(conf, new Path("output"));
// specify a mapper
conf.setMapperClass(WordCountMapper.class);
// specify a reducer
conf.setReducerClass(WordCountReducer.class);
conf.setCombinerClass(WordCountReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
我得到这个错误:
14/12/10 18:11:01 info mapred.jobclient:任务id:尝试\u 201412100143 \u 0008 \u m \u0000000,状态:failed java.io.ioexception:溢出失败,位于org.apache.hadoop.mapred.maptask$mapoutputbuffer.collect(maptask)。java:425)在wordcountmapper.map(wordcountmapper。java:31)在wordcountmapper.map(wordcountmapper。java:1)在org.apache.hadoop.mapred.maprunner.run(maprunner。java:47)在org.apache.hadoop.mapred.maptask.run(maptask。java:227)在org.apache.hadoop.mapred.tasktracker$child.main(tasktracker。java:2209)原因:java.io.ioexception:错误的值class:class org.apache.hadoop.io.text不是class org.apache.hadoop.io.intwriteable at org.apache.hadoop.mapred.ifile$writer.append(ifile)。java:143)在org.apache.hadoop.mapred.task$combineoutputcollector.collect(task。java:626)在wordcountreducer.reduce(wordcountreducer。java:29)在wordcountreducer.reduce(wordcountreducer。java:1)在org.apache.hadoop.mapred.maptask$mapoutputbuffer.combineandspill(maptask。java:904)在org.apache.hadoop.mapred.maptask$mapoutputbuffer.sortandspill(maptask。java:785)在org.apache.hadoop.mapred.maptask$mapoutputbuffer.access$1600(maptask。java:286)在org.apache.hadoop.mapred.maptask$mapoutputbuffer$spillthread.run(maptask。java:712)
在此之后,错误会重复多次。有人能解释一下为什么会发生这种错误吗?我搜索了与此类似的错误,但找到的都是Map器和还原器的不匹配键值类型,但正如我看到的,Map器和还原器的键值类型是匹配的。先谢谢你。
2条答案
按热度按时间uqdfh47h1#
试着评论
conf.setCombinerClass(WordCountReducer.class);
然后跑。这是因为数据缓冲区可能已满。
溢出错误
还包括
as map和reducer发出不同的键值数据类型。
如果两者都发出相同的数据类型,那么
够了。
bvn4nwqk2#
在wordcount类的这一行中,应该是