我正在学习分布式系统的spark。我运行了这个代码,它是有效的。但我知道它在输入文件中是计数字,但我不明白方法是如何编写的,以及javardd的用途
公共类javawordcount{
public static void main(String[] args) throws Exception {
System.out.print("le programme commence");
//String inputFile = "/mapr/demo.mapr.com/TestMapr/Input/alice.txt";
String inputFile = args[0];
String outputFile = args[1];
// Create a Java Spark Context.
System.out.print("le programme cree un java spark contect");
SparkConf conf = new SparkConf().setAppName("JavaWordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
// Load our input data.
System.out.print("Context créeS");
JavaRDD<String> input = sc.textFile(inputFile);
// map/split each line to multiple words
System.out.print("le programme divise le document en multiple line");
JavaRDD<String> words = input.flatMap(
new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String x) {
return Arrays.asList(x.split(" "));
}
}
);
System.out.print("Turn the words into (word, 1) pairse");
// Turn the words into (word, 1) pairs
JavaPairRDD<String, Integer> wordOnePairs = words.mapToPair(
new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String x) {
return new Tuple2(x, 1);
}
}
);
System.out.print(" // reduce add the pairs by key to produce counts");
// reduce add the pairs by key to produce counts
JavaPairRDD<String, Integer> counts = wordOnePairs.reduceByKey(
new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer x, Integer y) {
return x + y;
}
}
);
System.out.print(" Save the word count back out to a text file, causing evaluation.");
// Save the word count back out to a text file, causing evaluation.
counts.saveAsTextFile(outputFile);
System.out.println(counts.collect());
sc.close();
}
}
1条答案
按热度按时间ykejflvf1#
正如皮诺桑所提到的,这个问题可能过于笼统,你应该能够在任何spark入门或教程中找到你的答案。
让我给你介绍一些有趣的内容:
spark快速入门指南
apache spark入门,电子书
apachespark简介及示例和用例
免责声明:我为mapr工作这就是为什么我把在线资源放在mapr站点的spark上