mapreduce作业获取avro文件并输出序列文件时出错

mzsu5hc0 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(351)

我有一个mapreduce的工作，需要一个avro类型的文件 T 并且应该输出成对的表单作为序列文件。此作业只有一个Map器，下面是Map器和驱动程序的代码：
Map器：

public class AvroReaderMapper extends Mapper<LongWritable, AvroValue<ContentPackage>, Text, Text> {

@Override   
public void map(LongWritable k, AvroValue<T> record,Context context) throws IOException, InterruptedException {

       //some processsing

    }    

}

驱动程序：

public class SeqFileGenerator extends Configured implements Tool {

public static void main(String[] args) throws Exception {

    int res = ToolRunner.run(new Configuration(), new SeqFileGenerator(), args);
    System.exit(res);

}

@Override
public int run(String[] arg0) throws Exception {

//Job configuration
    Configuration conf = new Configuration();
    Job job = new Job(getConf());
    job.setJarByClass(SeqFileGenerator.class);
    job.setJobName("Sequence File Generator");

    //1-set the input and output path
    FileInputFormat.setInputPaths(job, new Path("in"));
    FileOutputFormat.setOutputPath(job, new Path("out"));

    //2-set the mapper and reducer class        
    job.setMapperClass(AvroReaderMapper.class);

    //3-set the input/output format
    AvroJob.setInputValueSchema(job, ContentPackage.SCHEMA$);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);        
    job.setOutputFormatClass(SequenceFileOutputFormat.class);               

    //4-run the job
    job.waitForCompletion(true);

    return 0;
}

}
运行时，它会显示以下错误消息：

java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.avro.mapred.AvroValue
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

我该怎么修？

hadoop mapreduce avro serialization

来源：https://stackoverflow.com/questions/29570041/error-in-mapreduce-job-taking-avro-file-and-outputting-a-sequence-file