mapreduce堆排序

lmvvr0a8 于 2021-06-01 发布在 Hadoop

关注(0)|答案(2)|浏览(341)

我试图分析的社会网络数据，其中包含 follower 以及 followee 对。我想找到使用mapreduce拥有最多followes的前10个用户。
我做了一双 userID 以及 number_of_followee 只需一个mapreduce步骤。
然而，有了这些数据，我不知道如何在分布式系统中对它们进行排序。
我不知道怎么做 priority queue 可以在Map器和还原器中使用，因为它们具有分布式数据。
有人能告诉我如何使用数据结构对大量数据进行排序吗？
非常感谢你。

hadoop mapreduce Distributed

来源：https://stackoverflow.com/questions/49379279/mapreduce-sorting-with-heap

2条答案

按热度按时间

ljsrvy3e1#

要按降序排列数据，需要另一个 mapreduce 工作。Map器将发出“关注者数量”作为键，twitter句柄作为值。

class SortingMap extends Map<LongWritable, Text, LongWritable, Text> {
    private Text value = new Text();
    private LongWritable key = new LongWritable(0);

    @Overwrite
    public void map(LongWritable key, Text value, Context context) throws IOException {
        String line = value.toString();
        // Assuming that the input data is "TweeterId <number of follower>" separated by tab
        String tokens[] = value.split(Pattern.quote("\t"));
        if(tokens.length > 1) {
            key.set(Long.parseLong(tokens[1]));
            value.set(tokens[0]);
            context.write(key, value);
        }
    }
}

对于减速器，使用 IdentityReducer<K,V> ```
// SortedComparator Class

public class DescendingOrderKeyComparator extends WritableComparator {
@Override
public int compare(WritableComparable w1, WritableComparable w2) {
return -1 * w1.compareTo(w2);
}
}

在driver类中，设置 `SortedComparator` ```
job.setSortComparatorClass(DescendingOrderKeyComparator.class);

赞(0）回复(0）举报 2021-06-01

mctunoxg2#

如果你有大的输入文件的格式 user_id = number_of_followers 一种简单的Map归约算法来寻找顶部 N 用户是：
每个Map器处理自己的输入，并在其文件中找到前n个用户，然后将它们写入一个reducer
单减速机接收 number_of_mappers * N 行并查找其中前n个用户

赞(0）回复(0）举报 2021-06-01

我来回答

mapreduce堆排序

2条答案

相关问题

热门标签

最新问答