如何以mapreduce格式打印一行中的一些令牌?

ff29svar  于 2021-06-01  发布在  Hadoop
关注(0)|答案(2)|浏览(288)

我在写一个Map函数。我有一个文本文件:

364.2   366.6   365.2   0   0   1   10421
364.2   366.6   365.2   0   0   1   10422

我想展示第1,3栏。这是我的代码,但它显示了所有的行。

public static class SumMap extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text str = new Text();

    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer lineIter = new StringTokenizer(value.toString(), "\\r?\\n");
        while (lineIter.hasMoreTokens()) {
            StringTokenizer tokenIter = new StringTokenizer(lineIter.nextToken(), "\\s+");
            while (tokenIter.hasMoreTokens()) {
                String v1 = tokenIter.nextToken();
                String v2 = tokenIter.nextToken();
                String c1 = tokenIter.nextToken();
                String c2 = tokenIter.nextToken();
                str.set(v1+c1);
                context.write(str, one);
            }

        }
    }
}

在这段代码中,第一个应该按行分割 ("\\r?\\n") 然后对于每一行,按数字、字符串或标记按 ("\\s+") . 最后,打印 v1+c1 . 如何更改代码?

qvtsj1bj

qvtsj1bj1#

如果使用textinputformat,则map的键是line number,值是line content。你不需要分割线。只需拆分每行:

@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
    String[] vals = value.toString().split("\\s+");
    if (vals.length == 7) {
        context.write(new Text(vals[0] + vals[2]), one);
    }

}
jecbmhm3

jecbmhm32#

问题在于生成的令牌数和访问的令牌数。在内部while循环中,生成的令牌数将是7。但您一次只能访问其中的4个。你要做的就是同时访问所有的代币。因为您只需要1和3列,所以检索它们并分别存储它们。

public static class SumMap extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text str = new Text();

    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer tokenIter = new StringTokenizer(lineIter.nextToken(), "\\s+");
        while (tokenIter.hasMoreTokens()) {
            String c1 = tokenIter.nextToken();
            String c2 = tokenIter.nextToken();
            String c3 = tokenIter.nextToken();
            String c4 = tokenIter.nextToken();
            String c5 = tokenIter.nextToken();
            String c6 = tokenIter.nextToken();
            String c7 = tokenIter.nextToken();
            str.set(c1+c3);
            context.write(str, one);
        }
    }
}

主要内容:

public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "sum");
        job.setJarByClass(SumMR.class);
        job.setMapperClass(SumMap.class);
//        job.setCombinerClass(IntSumReducer.class);
//        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        TextInputFormat.addInputPath(job, new Path(args[1]));
        FileOutputFormat.setOutputPath(job, new Path(args[2]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

这是修改后的代码。如果有问题,请告诉我!。

相关问题