我有一个mapreduce的工作,需要一个avro类型的文件 T
并且应该输出成对的表单作为序列文件。此作业只有一个Map器,下面是Map器和驱动程序的代码:
Map器:
public class AvroReaderMapper extends Mapper<LongWritable, AvroValue<ContentPackage>, Text, Text> {
@Override
public void map(LongWritable k, AvroValue<T> record,Context context) throws IOException, InterruptedException {
//some processsing
}
}
驱动程序:
public class SeqFileGenerator extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new SeqFileGenerator(), args);
System.exit(res);
}
@Override
public int run(String[] arg0) throws Exception {
//Job configuration
Configuration conf = new Configuration();
Job job = new Job(getConf());
job.setJarByClass(SeqFileGenerator.class);
job.setJobName("Sequence File Generator");
//1-set the input and output path
FileInputFormat.setInputPaths(job, new Path("in"));
FileOutputFormat.setOutputPath(job, new Path("out"));
//2-set the mapper and reducer class
job.setMapperClass(AvroReaderMapper.class);
//3-set the input/output format
AvroJob.setInputValueSchema(job, ContentPackage.SCHEMA$);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
//4-run the job
job.waitForCompletion(true);
return 0;
}
}
运行时,它会显示以下错误消息:
java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.avro.mapred.AvroValue
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
我该怎么修?
1条答案
按热度按时间nnt7mjpx1#
您必须为驱动程序类中的作业设置正确的输入格式。它默认采用textinputformat。
尝试在driver类中添加以下行
job.setinputformatclass(avrokeyinputformat.class);