如何在hadoop中使用二次排序查找第二个最高温度？

w46czmvw 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(385)

在二次排序中，例如年份和温度。我们把年份和温度作为一个组合键。并打印第一个键，值对打印当年最高气温。
1900 35°c
1900 34°c
1900 34°c
...
1901 36°c
1901 35°c

protected void reduce(IntPair key, Iterable<NullWritable> values,
Context context) throws IOException, InterruptedException {
    context.write(key, NullWritable.get());
}

现在，如果我们想打印特定年份的第二个最大值，我们怎么做呢。

hadoop mapreduce

来源：https://stackoverflow.com/questions/39443871/how-to-find-second-maximum-temperature-using-secondary-sort-in-hadoop

1条答案

按热度按时间

wbrvyc0a1#

这样的设置你不能这么做 temperature 还需要设置为值，因此 reduce 方法签名需要更改为：

protected void reduce(IntPair key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
    ...
}

二次排序的目的是使用复合键对值进行排序，以便 NullWritable 因为这个值会阻止它工作。一旦在值中有了温度，就可以遍历它们并忽略第一个值，从而得到第二个最大值。例如：

protected void reduce(IntPair key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
    boolean first = true;
    for (IntWritable temp : values) {
        if (!first) {
            context.write(key, temp);
            return;
        } else {
            first = false;
        }
    }
}

既然温度在这些值中，就可以遍历它们，忽略第一个，写出第二个，然后退出。
注意：此代码假定没有重复的温度。

赞(0）回复(0）举报 2021-05-29

我来回答

如何在hadoop中使用二次排序查找第二个最高温度？

1条答案

相关问题

热门标签

最新问答