第一个使用map和reducer的hadoop程序

arknldoa  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(298)

我正在尝试编译我的第一个hadoop程序。我有这样的输入文件:

1 54875451 2015 LA89LP
2 47451451 2015 LA89LP
3 878451 2015 LA89LP
4 54875 2015 LA89LP
5 2212 2015 LA89LP

当我编译它时,我得到map 100%、reducer 0%和一个java.lang.exception:java.util.nosuchelementexception,它是由很多员工引起的,包括:
java.util.nosuchelementexception异常
java.util.stringtokenizer.nexttoken(stringtokenizer。java:349)
我真的不明白为什么。任何帮助都是非常感谢的
我的Map和减速机是这样的:

public class Draft {

     public static class TokenizerMapper extends Mapper<Object, Text, Text, Text>{

     private Text word = new Text(); 
     private Text word2 = new Text();     

     public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {

       StringTokenizer itr = new StringTokenizer(value.toString());

       while (itr.hasMoreTokens()) {

       String id = itr.nextToken();
       String price = itr.nextToken();
       String dateTransfer = itr.nextToken();
       String postcode = itr.nextToken();

       word.set(postcode);
       word2.set(price);
       context.write(word, word2);
    }
  }
}

  public static class MaxReducer extends Reducer<Text,Text,Text,Text> {

    private Text word = new Text();
    private Text word2 = new Text();

    public void reduce(Text key, Iterable<Text> values, Context context
                       ) throws IOException, InterruptedException {
      String max = "0";
      HashSet<String> S = new HashSet<String>();

    for (Text val: values) {
        String d = key.toString();
        String price = val.toString(); 
        if (S.contains(d)) {
            if (Integer.parseInt(price)>Integer.parseInt(max)) max = price;
        } else {
            S.add(d);
            max = price;
        }
    }      

    word.set(key.toString());
    word2.set(max);
    context.write(word, word2);

    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Draft");
    job.setJarByClass(Draft.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setReducerClass(MaxReducer.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class); // output key type for mapper
    job.setOutputValueClass(Text.class); // output value type for mapper
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
brccelvz

brccelvz1#

当某些记录的字段少于4个时,会发生此错误。Map器中的代码假定每条记录包含4个字段: id , price , dateTransfer 以及 postcode .
但是,有些记录可能不包含所有4个字段。
例如,如果记录是:

1 54875451 2015

然后,下面的行将抛出一个异常( java.util.NoSuchElementException ):

String postcode = itr.nextToken();

您正在尝试分配 postcode (假设是第4个字段),但输入记录中只有3个字段。
要解决这个问题,您需要在 map() 方法。因为你只是在发射 postcode 以及 pricemap() ,您可以按以下方式更改代码:

String[] tokens = value.toString().split(" ");

String price = "";
String postcode = "";

if(tokens.length >= 2)
    price = tokens[1];

if(tokens.length >= 4)
    postcode = tokens[3];

if(!price.isEmpty())
{
    word.set(postcode);
    word2.set(price);
    context.write(word, word2);
}

相关问题