我用java编写了一个driver、mapper和reducer类,在测试数据上运行k近邻算法,并使用分布式缓存拉入训练集。我使用了一个cloudera虚拟机来测试代码,它在伪分布式模式下工作。
我正在浏览亚马逊的ec2/emr文档。。。似乎应该有一种方法可以轻松地将工作的javahadoop代码转换成在ec2中工作的代码,但是我看到了一大堆我以前从未见过的自定义amazon导入语句和方法。
以下是我的驱动程序代码示例:
import java.net.URI;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class KNNDriverEC2 extends Configured implements Tool {
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.setInt("rows",1000);
conf.setInt("columns",613);
DistributedCache.createSymlink(conf);
// might have to start next line with ./!!!
DistributedCache.addCacheFile(new URI("knn-jg/cache_data/train_sample.csv#train_sample.csv"),conf);
DistributedCache.addCacheFile(new URI("knn-jg/cache_data/train_labels.csv#train_labels.csv"),conf);
//DistributedCache.addCacheFile(new URI("cacheData/train_sample.csv"),conf);
//DistributedCache.addCacheFile(new URI("cacheData/train_labels.csv"),conf);
Job job = new Job(conf);
job.setJarByClass(KNNDriverEC2.class);
job.setJobName("KNN");
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(KNNMapperEC2.class);
job.setReducerClass(KNNReducerEC2.class);
// job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Configuration(), new KNNDriverEC2(), args);
System.exit(exitCode);
}
}
我已使示例运行,但在“fileinputformat.setinputpaths(job,new path(args[0]);”行引发异常。我将尝试通过文档来处理参数,但是到目前为止,我遇到了太多的错误,我想知道我是否还远远没有做到这一点。谢谢你的帮助。
暂无答案!
目前还没有任何答案,快来回答吧!