hadoop“输入路径不存在”看起来像本地驱动器

3gtaxfhh  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(401)

我的hadoop程序有问题。我试图读取一个文件到Map器,但我总是得到一个错误,告诉我该文件不存在。
代码如下:

Configuration conf = new Configuration();
    //String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    conf.set("mapreduce.job.queuename", "alpha");
    conf.setLong("mapreduce.task.timeout", 1000 * 60 * 60);
    conf.setDouble("mapreduce.job.reduce.slowstart.completedmaps", 0.75);
    conf.set("mapred.textoutputformat.separator", "\t");
    job.setMapperClass(MapperCollector.class);
    // job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(MetaDataReducer.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path("/user/myuser/theData.csv"));

    FileSystem hdfs = FileSystem.get(new Configuration());
    Path outFolder = new Path("/user/myuser/outFolder/");
    if (hdfs.exists(outFolder)) {
        hdfs.delete(outFolder, true); //Delete existing Directory
    }
    FileOutputFormat.setOutputPath(job, outFolder);

    System.exit(job.waitForCompletion(true) ? 0 : 1);

它失败了,错误是:

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/myuser/theData.csv
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at myuser.mypackage.GenerateTrainingData.main(GenerateTrainingData.java:82)

该代码以前可以工作,但在重新启动集群之后就不工作了。此外,我可以做“hadoop df-cat/user/myuser/thedata.csv”,而且工作得非常好。
我觉得hadoop现在正在查看本地磁盘,但是文件在hdfs中。我不知道为什么会这样。

sshcrbum

sshcrbum1#

如果有人像我一样是个白痴,我就跑:

java -jar mycode.jar

而不是

hadoop jar mycode.jar

正确操作后,一切工作都完美无瑕。

相关问题