job.addcachefile抛出filenotfound错误

yk9xbfzb  于 2021-05-31  发布在  Hadoop
关注(0)|答案(0)|浏览(219)

我试着用 job.addCacheFile 将文件添加到mapreduce中的分布式缓存以进行Map端连接,但是它会抛出filenotfound错误。我看过一些类似的问题,但没有一个适合我的情况。下面是我用hadoop2.6.5所做的
在驾驶课上

Configuration conf = super.getConf();

// absolute path on HDFS
// not sure if relative path or absolute path matters here
Path fileToBeCached = new Path("/test-data/cacheFiles");
d
Job job = Job.getInstance(conf);
output.getFileSystem(conf).delete(output, true);

FileSystem fs = fileToBeCached.getFileSystem(conf);
FileStatus filesStatus = fs.getFileStatus(fileToBeCached);

if (filesStatus.isDirectory()) {
    for (FileStatus f : fs.listStatus(fileToBeCached)) {
        if (f.getPath().getName().startsWith("part")) {
            job.addCacheFile(f.getPath().toUri());
        }
    }
} else {
    job.addCacheFile(fileToBeCached.toUri());
}

在mapper类中:

public static class Map extends Mapper<Text, Text, Text, Text> {
    private Set<String> recordSet = new HashSet<String>();

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        URI[] files = context.getCacheFiles();
        if (files.length > 0) {
            for (URI uri : files) {
                System.out.println("Cached file: " + uri);
                File path = new File(uri.getPath());
                loadCache(path);
            }
        }
    }

    private void loadCache(File file) throws IOException {
        recordSet.addAll(FileUtils.readLines(file));
    }
}

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题