hadoop1.2.1-使用分布式缓存

jvidinwx 于 2021-05-31 发布在 Hadoop

关注(0)|答案(1)|浏览(375)

我开发了一个使用分布式缓存的hadoop应用程序。我用的是hadoop2.9.0。在独立和伪分布式模式下，一切都可以正常工作。
司机：

public class MyApp extends Configured implements Tool{
public static void main(String[] args) throws Exception{
        if(args.length < 2) {
            System.err.println("Usage: Myapp -files cache.txt <inputpath> <outputpath>");

        System.exit(-1);
    }

    int res = ToolRunner.run(new Configuration(), new IDS(), args);
    System.exit(res);

...
Map器：

public class IDSMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
@Override
    protected void setup(Context context) throws IOException {
        BufferedReader bfr = new BufferedReader(new FileReader(new File("cache.txt")));

启动： sudo bin/hadoop jar MyApp.jar -files cache.txt /input /output 现在我需要测量一个真正的hadoop集群上的执行时间。不幸的是，我有hadoop1.2.1版本的hadoopcluster供我使用。因此，我创建了新的eclipse项目，引用了适当的hadoop1.2.1jar文件，并且evertything在独立模式下运行良好。但是，hadoop1.2.1的伪分布式模式在尝试读取分布式缓存文件时失败，Map器类（setup方法）中出现filenotfoundexception。
在hadoop1.2.1中，我是否必须以其他方式处理分布式缓存文件？

Java hadoop distributed-cache

来源：https://stackoverflow.com/questions/49913687/hadoop-1-2-1-using-distributed-cache