用javapairdd实现hadoopMap

x4shl7ld  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(304)

我有rdd:

JavaPairRDD<Long, ViewRecord> myRDD

它是通过 newAPIHadoopRDD 方法。我有一个已经存在的map函数,我想用spark的方式实现它:

LongWritable one = new LongWritable(1L);

protected void map(Long key, ViewRecord viewRecord, Context context)
    throws IOException ,InterruptedException {

  String url = viewRecord.getUrl();
  long day = viewRecord.getDay();

  tuple.getKey().set(url);
  tuple.getValue().set(day);

  context.write(tuple, one);
};

ps:元组派生自:

KeyValueWritable<Text, LongWritable>

可以在这里找到:textlong.java

jvlzgdj9

jvlzgdj91#

我不知道什么是元组,但如果你只是想用键把记录Map到元组 (url, day) 和价值 1L 你可以这样做:

result = myRDD
    .values()
    .mapToPair(viewRecord -> {
        String url = viewRecord.getUrl();
        long day = viewRecord.getDay();
        return new Tuple2<>(new Tuple2<>(url, day), 1L);
    })

//java 7 style
JavaPairRDD<Pair, Long> result = myRDD
        .values()
        .mapToPair(new PairFunction<ViewRecord, Pair, Long>() {
                       @Override
                       public Tuple2<Pair, Long> call(ViewRecord record) throws Exception {
                           String url = record.getUrl();
                           Long day = record.getDay();

                           return new Tuple2<>(new Pair(url, day), 1L);
                       }
                   }
        );

相关问题