我试着用 job.addCacheFile
将文件添加到mapreduce中的分布式缓存以进行Map端连接,但是它会抛出filenotfound错误。我看过一些类似的问题,但没有一个适合我的情况。下面是我用hadoop2.6.5所做的
在驾驶课上
Configuration conf = super.getConf();
// absolute path on HDFS
// not sure if relative path or absolute path matters here
Path fileToBeCached = new Path("/test-data/cacheFiles");
d
Job job = Job.getInstance(conf);
output.getFileSystem(conf).delete(output, true);
FileSystem fs = fileToBeCached.getFileSystem(conf);
FileStatus filesStatus = fs.getFileStatus(fileToBeCached);
if (filesStatus.isDirectory()) {
for (FileStatus f : fs.listStatus(fileToBeCached)) {
if (f.getPath().getName().startsWith("part")) {
job.addCacheFile(f.getPath().toUri());
}
}
} else {
job.addCacheFile(fileToBeCached.toUri());
}
在mapper类中:
public static class Map extends Mapper<Text, Text, Text, Text> {
private Set<String> recordSet = new HashSet<String>();
@Override
protected void setup(Context context) throws IOException, InterruptedException {
URI[] files = context.getCacheFiles();
if (files.length > 0) {
for (URI uri : files) {
System.out.println("Cached file: " + uri);
File path = new File(uri.getPath());
loadCache(path);
}
}
}
private void loadCache(File file) throws IOException {
recordSet.addAll(FileUtils.readLines(file));
}
}
暂无答案!
目前还没有任何答案,快来回答吧!