我必须修改hadoop的wordcount示例,计算以前缀“cons”开头的单词数,然后需要按频率降序对结果进行排序。有人能告诉我怎么写这个的Map器和缩减器代码吗?
代码:
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//Replacing all digits and punctuation with an empty string
String line = value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase();
//Extracting the words
StringTokenizer record = new StringTokenizer(line);
//Emitting each word as a key and one as itsvalue
while (record.hasMoreTokens())
context.write(new Text(record.nextToken()), new IntWritable(1));
}
}
1条答案
按热度按时间ulydmbyx1#
要计算以“cons”开头的单词数,您可以在从mapper发出时丢弃所有其他单词。
reducer现在只接收一个key=cons\u count,您可以对这些值求和以获得计数。
要根据频率对中以“cons”开头的单词进行排序,以cons开头的单词应转到同一个reducer,reducer应将其汇总并排序。要做到这一点,
减速器:
}