bigdata hadoop java codeforwordcount已修改

yr9zkbsy  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(320)

我必须修改hadoop的wordcount示例,计算以前缀“cons”开头的单词数,然后需要按频率降序对结果进行排序。有人能告诉我怎么写这个的Map器和缩减器代码吗?
代码:

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> 
{ 
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
    { 
        //Replacing all digits and punctuation with an empty string 
        String line =  value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase();
        //Extracting the words 
        StringTokenizer record = new StringTokenizer(line); 
        //Emitting each word as a key and one as itsvalue 
        while (record.hasMoreTokens()) 
            context.write(new Text(record.nextToken()), new IntWritable(1)); 
    } 
}
ulydmbyx

ulydmbyx1#

要计算以“cons”开头的单词数,您可以在从mapper发出时丢弃所有其他单词。

public void map(Object key, Text value, Context context) throws IOException,
        InterruptedException {
    IntWritable one = new IntWritable(1);
    String[] words = value.toString().split(" ");
    for (String word : words) {
        if (word.startsWith("cons"))
              context.write(new Text("cons_count"), one);
    }
}

reducer现在只接收一个key=cons\u count,您可以对这些值求和以获得计数。
要根据频率对中以“cons”开头的单词进行排序,以cons开头的单词应转到同一个reducer,reducer应将其汇总并排序。要做到这一点,

public class MyMapper extends Mapper<Object, Text, Text, Text> {

@Override
public void map(Object key, Text value, Context output) throws IOException,
        InterruptedException {
      String[] words = value.toString().split(" ");
      for (String word : words) {
        if (word.startsWith("cons"))
              context.write(new Text("cons"), new Text(word));
    }
 }
}

减速器:

public class MyReducer extends Reducer<Text, Text, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<Text> values, Context output)
        throws IOException, InterruptedException {
    Map<String,Integer> wordCountMap = new HashMap<String,Integer>();
    for(Text value: values){
        word = value.get();
        if (wordCountMap.contains(word) {
           Integer count = wordCountMap.get(key);
           count++;
           wordCountMap.put(word,count)
        }else {
         wordCountMap.put(word,new Integer(1));
        }
    }

    //use some sorting mechanism to sort the map based on values.
    // ...

    for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
        context.write(new Word(entry.getKey(),new IntWritable(entry.getValue());
    } 
}

}

相关问题