apachespark:将grib2文件读取到rdd

sigwle7e  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(368)

有没有可能通过sparkapi将grib2文件从hdfs读入rdd?我发现 JavaContext.binaryFiles ,但返回的rdd包含隐藏的数据(不是人类可读的)。我正在使用spark 1.6.1和JavaAPI。谢谢您!

String inputFile = "hdfs://hdfs:8020/data/testdata.bin";
SparkConf sparkConf = SparkConfFactory.createSparkConf("WeatherData");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaPairRDD<String, PortableDataStream> inputRdd = sc.binaryFiles(inputFile);

List<Tuple2<String, PortableDataStream>> asList = inputRdd.collect();       
for(Tuple2<String, PortableDataStream> a : asList) {
    System.out.println(a._1());                                             // Key = File path
    DataInputStream in = new DataInputStream(a._2().open()); 
    BufferedReader d = new BufferedReader(new InputStreamReader(in));

    while(d.ready()) {
        System.out.println(d.readLine());                                   // Cryptic output
    }
}

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题