spark大数据编程之父(spark代码)

ghg1uchk 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(400)

我正在学习分布式系统的spark。我运行了这个代码，它是有效的。但我知道它在输入文件中是计数字，但我不明白方法是如何编写的，以及javardd的用途
公共类javawordcount{

public static void main(String[] args) throws Exception {

    System.out.print("le programme commence");
    //String inputFile = "/mapr/demo.mapr.com/TestMapr/Input/alice.txt";
    String inputFile = args[0];
    String outputFile = args[1];
    // Create a Java Spark Context.
    System.out.print("le programme cree un java spark contect");

    SparkConf conf = new SparkConf().setAppName("JavaWordCount");
    JavaSparkContext sc = new JavaSparkContext(conf);
    // Load our input data.
    System.out.print("Context créeS");

    JavaRDD<String> input = sc.textFile(inputFile);

    // map/split each line to multiple words

    System.out.print("le programme divise le document en multiple line");

    JavaRDD<String> words = input.flatMap(
            new FlatMapFunction<String, String>() {
                @Override
                public Iterable<String> call(String x) {
                    return Arrays.asList(x.split(" "));
                }
            }
    );
    System.out.print("Turn the words into (word, 1) pairse");

    // Turn the words into (word, 1) pairs
    JavaPairRDD<String, Integer> wordOnePairs = words.mapToPair(
            new PairFunction<String, String, Integer>() {
                @Override
                public Tuple2<String, Integer> call(String x) {
                    return new Tuple2(x, 1);
                }
            }
    );

    System.out.print("        // reduce add the pairs by key to produce counts");

    // reduce add the pairs by key to produce counts
    JavaPairRDD<String, Integer> counts = wordOnePairs.reduceByKey(
            new Function2<Integer, Integer, Integer>() {
                @Override
                public Integer call(Integer x, Integer y) {
                    return x + y;
                }
            }
    );

    System.out.print(" Save the word count back out to a text file, causing evaluation.");

    // Save the word count back out to a text file, causing evaluation.
    counts.saveAsTextFile(outputFile);
    System.out.println(counts.collect());
    sc.close();
}

}

hadoop apache-spark mapr

来源：https://stackoverflow.com/questions/36217698/begenner-at-spark-big-data-programming-spark-code

1条答案

按热度按时间

ykejflvf1#

正如皮诺桑所提到的，这个问题可能过于笼统，你应该能够在任何spark入门或教程中找到你的答案。
让我给你介绍一些有趣的内容：
spark快速入门指南
apache spark入门，电子书
apachespark简介及示例和用例
免责声明：我为mapr工作这就是为什么我把在线资源放在mapr站点的spark上

赞(0）回复(0）举报 2021-06-02

我来回答

spark大数据编程之父(spark代码)

1条答案

相关问题

热门标签

最新问答