如何将javahadoop代码转换为在ec2上运行?

xpcnnkqh  于 2021-06-04  发布在  Hadoop
关注(0)|答案(0)|浏览(191)

我用java编写了一个driver、mapper和reducer类,在测试数据上运行k近邻算法,并使用分布式缓存拉入训练集。我使用了一个cloudera虚拟机来测试代码,它在伪分布式模式下工作。
我正在浏览亚马逊的ec2/emr文档。。。似乎应该有一种方法可以轻松地将工作的javahadoop代码转换成在ec2中工作的代码,但是我看到了一大堆我以前从未见过的自定义amazon导入语句和方法。
以下是我的驱动程序代码示例:

import java.net.URI;

        import org.apache.hadoop.conf.Configured;
        import org.apache.hadoop.conf.Configuration;
        import org.apache.hadoop.filecache.DistributedCache;
        import org.apache.hadoop.fs.Path;
        import org.apache.hadoop.io.IntWritable;
        import org.apache.hadoop.mapreduce.Job;
        import org.apache.hadoop.util.Tool;
        import org.apache.hadoop.util.ToolRunner;
        import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
        import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

        public class KNNDriverEC2 extends Configured implements Tool {
            public int run(String[] args) throws Exception {

                Configuration conf = new Configuration();

                conf.setInt("rows",1000);
                conf.setInt("columns",613);

                DistributedCache.createSymlink(conf);
                // might have to start next line with ./!!!
                DistributedCache.addCacheFile(new URI("knn-jg/cache_data/train_sample.csv#train_sample.csv"),conf);
                DistributedCache.addCacheFile(new URI("knn-jg/cache_data/train_labels.csv#train_labels.csv"),conf);
                //DistributedCache.addCacheFile(new URI("cacheData/train_sample.csv"),conf);
                //DistributedCache.addCacheFile(new URI("cacheData/train_labels.csv"),conf);

                Job job = new Job(conf);
                job.setJarByClass(KNNDriverEC2.class); 
                job.setJobName("KNN");

                FileInputFormat.setInputPaths(job, new Path(args[0]));
                FileOutputFormat.setOutputPath(job, new Path(args[1]));

                job.setMapperClass(KNNMapperEC2.class);
                job.setReducerClass(KNNReducerEC2.class);
                // job.setInputFormatClass(KeyValueTextInputFormat.class);

                job.setMapOutputKeyClass(IntWritable.class);
                job.setMapOutputValueClass(IntWritable.class);

                job.setOutputKeyClass(IntWritable.class);
                job.setOutputValueClass(IntWritable.class);

                boolean success = job.waitForCompletion(true);
                return success ? 0 : 1;
            }

            public static void main(String[] args) throws Exception {
                int exitCode = ToolRunner.run(new Configuration(), new KNNDriverEC2(), args);
                System.exit(exitCode);
            }
        }

我已使示例运行,但在“fileinputformat.setinputpaths(job,new path(args[0]);”行引发异常。我将尝试通过文档来处理参数,但是到目前为止,我遇到了太多的错误,我想知道我是否还远远没有做到这一点。谢谢你的帮助。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题