mapreduce-从提供的路径读取文件

dgtucam1  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(338)

我使用下面的代码读取Map器中提供的文件路径。在一个类似的问题中提到了这个代码。

import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.util.*;

    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.*;
    import org.apache.hadoop.mapreduce.lib.input.*;
    import org.apache.hadoop.mapreduce.lib.output.*;
    import org.apache.hadoop.util.*;
    import org.apache.hadoop.mapred.MapReduceBase;

     import java.util.StringTokenizer;

     public class StubDriver {

// Main Method 

public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration(); // Configuration Object       
    Job job = new Job(conf, "My Program");  
    FileSystem fs = FileSystem.get(conf);       
    job.setJarByClass(StubDriver.class);
    job.setMapperClass(Map1.class);
    // job.setPartitionClass(Part1);
    // job.setReducerClass(Reducer1);
    // job.setNumReduceTasks(3);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    TextInputFormat.addInputPath(job,new Path(args[0]));;
    TextOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    job.setMapOutputKeyClass(IntWritable.class);
    job.setMapOutputValueClass(Text.class);

    job.waitForCompletion(true);        
}

// Mapper

public static class Map1 extends Mapper<LongWritable,Text,IntWritable,Text>  {

    public void setup(Context context) throws IOException {

        Path pt = new Path("hdfs://quickstart.cloudera:8020/dhawalhdfs/input/*");
        FileSystem fs = FileSystem.get(new Configuration());
        BufferedReader br= new BufferedReader(new InputStreamReader(fs.open(pt)));
        String line;
        line = br.readLine();
        while (line != null) {
            System.out.println(line);
            line = br.readLine();

        }               

    }

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        StringTokenizer tokenizer = new StringTokenizer(value.toString());

        String a = tokenizer.nextToken();
        String b = tokenizer.nextToken();
        String c = tokenizer.nextToken();
        String d = tokenizer.nextToken();
        String e = tokenizer.nextToken();

        context.write(new IntWritable(Integer.parseInt(c)),new Text(a + "\t" + b + "\t" + d + "\t" + e));

        }       
}                                           
}

代码编译成功。我在提交作业时遇到错误。。由于我在我的程序中提供了输入路径,我试图只提交输出路径,如下所示-

hadoop jar /home/cloudera/dhawal/MR/Par.jar StubDriver /dhawalhdfs/dhawal000

我也会犯同样的错误

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at StubDriver.main(StubDriver.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
yqlxgs2m

yqlxgs2m1#

这是一个简单的错误…:-)

new Path(

参数[1] )); 是错误的根源。在这里,您试图传递数组的一个参数,并试图读取第二个元素
您正在访问存根驱动程序,如下所示

TextInputFormat.addInputPath(job,new Path(args[0]));;
TextOutputFormat.setOutputPath(job, new Path(args[1]));

但对于司机来说,你只传递了一个如下的论点

hadoop jar /home/cloudera/dhawal/MR/Par.jar StubDriver /dhawalhdfs/dhawal000

理想情况下,你应该传递一个空格分隔的参数

hadoop jar /home/cloudera/dhawal/MR/Par.jar StubDriver /dhawalhdfs   /dhawal000

相关问题