java—根据单词在减缩器上的长度对单词进行分类

nfs0ujit  于 2021-06-04  发布在  Hadoop
关注(0)|答案(2)|浏览(375)

我是mapreduce应用程序的新手。我只是想在我的数据集中找到单词的长度,并根据它们的长度将它们分类为tiny,little,med,great,最后,我想看看在java中我的数据集中有多少单词是tiny,little,med或great,但是我在实现reducer时遇到了一个问题。当我在hadoop集群上执行jar文件时,它不会返回任何结果。如果有人帮我一把,我会很感激的。下面是我试图执行的reducer代码,但我猜有很多错误。

public class WordSizeReducer extends Reducer<IntWritable, IntWritable, Text, IntWritable> {
    private IntVariable result = new IntVariable();
    IntWritable tin, smal, mediu,bi;
    int t, s, m, b;
    int count;
    Text tiny, small, medium, big;

    public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{

        for (IntWritable val:values){   
            if(val.get() == 1){
                tin.set(t);
                t++;                            
                }
            else if(2<=val.get() && val.get()<=4){
                smal.set(s);
                s++;                
                }
            else if(5<=val.get() && val.get()<=9){
                mediu.set(m);
                m++;                
                }
            else if(10<=val.get()){
                bi.set(b);
                b++;    }

        }       
        context.write(tiny, tin);
        context.write(small, smal);
        context.write(medium, mediu);
        context.write(big, bi); 
    }
}

public class WordSizeMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private IntWritable wordLength = new IntWritable();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            wordLength.set(tokenizer.nextToken().length());
            context.write(wordLength, one);     
        }
    }
}
jyztefdp

jyztefdp1#

tiny , small , medium 以及 big 从未初始化,因此它们将为空。
这意味着你所有的 context.write() 调用正在使用null键。
显然,这是不好的,因为您将无法区分不同字号的计数。
更糟的是, tin , smal , mediu , bi 从未初始化,这将导致 NullPointerException 当你打电话的时候 set() 在他们身上(你初始化 result 正确,但永远不要使用它)。
(另外,您不需要设置 IntWritables 在你的循环中不断重复这些值;只是更新一下 t,s,m,b 然后设置 IntWritable 在比赛结束前有一次 context.write() 电话)
现在更新添加的Map器代码:
对于输入中的每个单词,您都在编写键值对(长度,1)。
reducer将使用同一个键收集所有值,因此将使用调用它,例如:

(2, [1,1,1,1,1,1,1,1,])
(3, [1,1,1])

因此,reducer只会看到值“1”,它错误地将其视为字长。实际上,关键是字长。
现在更新添加的堆栈跟踪:
错误消息解释了错误所在—hadoop找不到作业类,因此根本没有执行它们。错误显示:

java.lang.ClassNotFoundException: WordSize.WordsizeMapper

但是你的班级叫 WordSizeMapper (或者 WordSize.WordSizeMapper 如果你有一个外层阶级)-注意不同的资本化“大小”/“大小”!您需要检查如何调用hadoop。

hpxqektj

hpxqektj2#

没有办法,我也检查了我的代码,我做了一些修复,但结果是一样的,在hadoop终端窗口,我不能得到任何结果。代码的最新版本如下:

public class WordSizeTest {
    public static void main(String[] args) throws Exception{
        if(args.length != 2)
        {
            System.err.println("Usage: Word Size <in> <out>");
            System.exit(2);
        } 
        Job job = new Job();    
        job.setJarByClass(WordSizeTest.class); 
        job.setJobName("Word Size");
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setMapperClass(WordSizeMapper.class); 
        job.setReducerClass(WordSizeReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class); 
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
public class WordSizeMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
    final static IntWritable one = new IntWritable(1);
    IntWritable wordLength = new IntWritable();
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
    {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            wordLength.set(tokenizer.nextToken().length());
            context.write(wordLength, one);     
    }
    }
}
public class WordSizeReducer extends Reducer<IntWritable, IntWritable, Text, IntWritable>{
    IntWritable tin = new IntWritable();
    IntWritable smal = new IntWritable();
    IntWritable mediu = new IntWritable();
    IntWritable bi = new IntWritable();
    int t, s, m, b;
    Text tiny = new Text("tiny");
    Text small = new Text("small");
    Text medium = new Text("medium");
    Text big = new Text("big");
    public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{        
        for (IntWritable val:values){
            if(key.get() == 1){
                t += val.get();                         
                }
            else if(2<=key.get() && key.get()<=4){
                s += val.get();             
                }
            else if(5<=key.get() && key.get()<=9){
                m += val.get();             
                }
            else if(10<=key.get()){
                b += val.get();             
                }

        }
        tin.set(t); 
        smal.set(s);
        mediu.set(m);
        bi.set(b);
        context.write(tiny, tin);
        context.write(small, smal);
        context.write(medium, mediu);
        context.write(big, bi); 
    }
    }

终端上的错误是这样的,

15/02/01 12:09:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/02/01 12:09:25 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/02/01 12:09:25 INFO input.FileInputFormat: Total input paths to process : 925
15/02/01 12:09:25 WARN snappy.LoadSnappy: Snappy native library is available
15/02/01 12:09:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/02/01 12:09:25 INFO snappy.LoadSnappy: Snappy native library loaded
15/02/01 12:09:29 INFO mapred.JobClient: Running job: job_201501191143_0177
15/02/01 12:09:30 INFO mapred.JobClient:  map 0% reduce 0%
15/02/01 12:09:47 INFO mapred.JobClient: Task Id : attempt_201501191143_0177_m_000001_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: WordSize.WordSizeMapper
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:859)
    at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(AccessController.java:310)
    at javax.security.auth.Subject.doAs(Subject.java:573)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: WordSize.WordsizeMapper
    at java.lang.Class.forName(Class.java:174)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:812)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
    ... 8 more
15/02/01 12:09:49 INFO mapred.JobClient: Task Id : attempt_201501191143_0177_m_000000_0, Status : FAILED

相关问题