如何在hadoop中将对象作为值传递

rryofs0p  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(289)

在hadoop中是否允许传递对象(如树)作为Map器的输出值?是吗,怎么了?

bqf10yzr

bqf10yzr1#

扩展tariq的链接,并简单地详细说明 <Text, IntWritable> 树状图:

public class TreeMapWritable extends TreeMap<Text, IntWritable> 
                             implements Writable {

    @Override
    public void write(DataOutput out) throws IOException {
        // write out the number of entries
        out.writeInt(size());
        // output each entry pair
        for (Map.Entry<Text, IntWritable> entry : entrySet()) {
            entry.getKey().write(out);
            entry.getValue().write(out);
        }
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        // clear current contents - hadoop re-uses objects
        // between calls to your map / reduce methods
        clear();

        // read how many items to expect
        int count = in.readInt();
        // deserialize a key and value pair, insert into map
        while (count-- > 0) {
            Text key = new Text();
            key.readFields(in);

            IntWritable value = new IntWritable();
            value.readFields(in);

            put(key, value);
        }
    }
}

基本上,hadoop中的默认序列化工厂期望对象输出实现可写接口(上面详述的readfields和write方法)。通过这种方式,您几乎可以扩展任何类来重新适应序列化方法。
另一个选项是启用java序列化(它使用默认的java序列化方法) org.apache.hadoop.io.serializer.JavaSerialization 通过配置 io.serializations 配置属性,但我不建议这样做。

相关问题