hadoop分布式缓存抛出filenotfound错误

eoxn13cs 于 2021-06-02 发布在 Hadoop

关注(0)|答案(4)|浏览(380)

我正在尝试使用listofwords文件只计算任何输入文件中的单词。作为filenotfound获取错误，即使我已经验证了文件在hdfs中的正确位置。
内部驱动器：

Configuration conf = new Configuration();
    DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf);
    Job job = new Job(conf,"CountEachWord Job");

内部Map器：

private Path[] ref_file;
ArrayList<String> globalList = new ArrayList<String>();

public void setup(Context context) throws IOException{

    this.ref_file = DistributedCache.getLocalCacheFiles(context.getConfiguration());

    FileSystem fs = FileSystem.get(context.getConfiguration());

    FSDataInputStream in_file = fs.open(ref_file[0]);
    System.out.println("File opened");

    BufferedReader br  = new BufferedReader(new InputStreamReader(in_file));//each line of reference file
    System.out.println("BufferReader invoked");

    String eachLine = null;
    while((eachLine = br.readLine()) != null)
    {
        System.out.println("eachLine is: "+ eachLine);
        globalList.add(eachLine);

    }

}

错误消息：

hadoop jar CountOnlyMatchWords.jar CountEachWordDriver Rhymes CountMatchWordsOut1
 Warning: $HADOOP_HOME is deprecated.

14/10/07 22:28:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the     arguments.      Applications should implement Tool for the same.
14/10/07 22:28:59 INFO input.FileInputFormat: Total input paths to process : 1
14/10/07 22:28:59 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/10/07 22:28:59 WARN snappy.LoadSnappy: Snappy native library not loaded
14/10/07 22:29:00 INFO mapred.JobClient: Running job: job_201409300531_0041
14/10/07 22:29:01 INFO mapred.JobClient:  map 0% reduce 0%
14/10/07 22:29:14 INFO mapred.JobClient: Task Id : attempt_201409300531_0041_m_000000_0, Status : FAILED
 java.io.FileNotFoundException: File does not exist: /home/training/hadoop-temp/mapred/local /taskTracker/distcache/5910352135771601888_2043607380_1633197895/localhost/user/training/listOfWords

我已经验证了上述文件存在于hdfs中。我也试过使用localrunner。还是不行。

Java hadoop mapreduce distributed-caching

来源：https://stackoverflow.com/questions/26241628/hadoop-distributed-cache-throws-filenotfound-error

4条答案

按热度按时间

ecfdbz9o1#

try {
        URI[] cacheFiles = DistributedCache.getCacheFiles(job); // Fetch the centroid file from distributed cache
        Path getPath = new Path(cacheFiles[0].getPath());  
        FileSystem fs = FileSystem.get(job);
        if (cacheFiles != null && cacheFiles.length > 0) {
            // Goes in if the file exist and is not empty
            String line; 
            centers.clear(); // clearing the centers array list each time
            BufferedReader cacheBufferReader = new BufferedReader(new InputStreamReader(fs.open(getPath)));
            try {
                while ((line = cacheBufferReader.readLine()) != null) {
                        centers.add(line);
                } 
            } catch (IOException e) {
                System.err.println("Exception: " + e);
            }
        }
    } catch (IOException e) {
        System.err.println("Exception: " + e);
    }

赞(0）回复(0）举报 2021-06-03

ewm0tg9j2#

您可以尝试此方法来检索文件。
uri[]files=distributedcache.getcachefiles（context.getconfiguration（））；
你可以遍历文件。

赞(0）回复(0）举报 2021-06-03

jgwigjjp3#

在主要方法中，我使用这个。

Job job = Job.getInstance();
  job.setJarByClass(DistributedCacheExample.class);
  job.setJobName("Distributed cache example");
  job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());

然后在mapper里我用了这个样板。

protected void setup(Context context) throws IOException, InterruptedException {
     URI[] files = context.getCacheFiles();
     for(URI file : files){
     if(file.getPath().contains("abc.dat")){
       Path path = new Path(file);
       BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
       String line = reader.readLine();
       while(line != null){
         ......
       }
     }
  }

我正在处理这些依赖项

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>2.7.3</version>
  </dependency>

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>2.7.3</version>
  </dependency>

对我来说，诀窍在于使用 path.getName 在 FileReader 如果不是，我会得到 FileNotFoundException

赞(0）回复(0）举报 2021-06-02

6ie5vjzr4#

像这样试试
在驱动器中

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}

在mapper setup（）中

public void setup(Context context) throws IOException{
 Configuration conf = context.getConfiguration();
 FileSystem fs = FileSystem.get(conf);
 URI[] cacheFiles = DistributedCache.getCacheFiles(conf);
 Path getPath = new Path(cacheFiles[0].getPath());  
 BufferedReader bf = new BufferedReader(new InputStreamReader(fs.open(getPath)));
 String setupData = null;
 while ((setupData = bf.readLine()) != null) {
   System.out.println("Setup Line in reducer "+setupData);
 }
}

赞(0）回复(0）举报 2021-06-02

我来回答

hadoop分布式缓存抛出filenotfound错误

4条答案

相关问题

热门标签

最新问答