如何使用mapreduce计算特定单词?

2ledvvac  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(412)

我修改了正常的字数计算程序,它计算每一个字,使之只计算特定的字。
reducer和map类与正常的字数计数相同。字数不够。我在文件中多次出现相同的特定单词,但得到一个作为计数。

public class wordcountmapper extends MapReduceBase implements Mapper<LongWritable, Tex, Text, IntWritable>                       // mapper function implemented.
{
    private final static IntWritable one = new IntWritable(1); // intwritable
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();      // conversion in string
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            if (line.compareTo("Cold") == 0) {  //cold is the specific word to get count for
                output.collect(word, one);      // getting 1 as a count for 'cold' as if its counting only first line 'cold' and not going to next line.
            }
        }
    }
}
zqdjd7g9

zqdjd7g91#

首先,你的 if statement 将行对象与“cold”进行比较是错误的。它应该把标记化的单词与“cold”进行比较 if(tokenizer.nextToken().equals("Cold")) .
我不知道如何用目前的逻辑,你得到的计数“冷”为1。可能在你的输入中有一行字是“冷”。

相关问题