java—在hadoop中用相同的键减去两个数字

bgtovc5b  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(355)

我有两份表格文件
文件1:
键1值1
键2值2
...
文件2:
键1值3
键2值4
...
我想制作一个减量输出的表格
键1(value1-value3)/value1
键2(value2-value4)/value2
我让map写入键,值前面有一个字符,告诉它来自file1或file2,但不确定如何写入reduce阶段
我的Map方法是

public void map(LongWritable key,Text val,Context context) throws IOException,     InterruptedException
    {
        Text outputKey = new Text();
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("A")
        {               
            outputValue.set("A,"+val);
        }
        else
        {
            outputValue.set("B," + val);
        }
        context.write(outputKey,  outputValue);
    }
}
5kgi1eie

5kgi1eie1#

我发现在这种情况下使用namedvector非常有用。这提供了值的标识,以便您可以基于“名称”对值执行所需的操作。

iih3973s

iih3973s2#

它应该足够简单,因为您已经标记了它,尽管一开始有点混乱。我假设发射的值 A23 (对于文件1)和 B139 (对于文件2)。代码段:

public void reduce(Text key, Iterable<Text> values, Context context)
        throws IOException, InterruptedException {

    int diff = 0;
    int denominator = 1;
    for (Text val : values) {
        if (val.toString().startsWith("A")) {
            denominator = Integer.parseInt(val.toString().substring(1));
            diff += denominator;
        } else if (val.toString().startsWith("B")) {
            diff -= Integer.parseInt(val.toString().substring(1));
        } else {
            // This block shouldn't be reached unless malformed values are emitted
            // Throw an exception or log it
        }
    }
    diff /= denominator;
    context.write(key, new IntWritable(diff));
}

希望这会有帮助。但我认为你的方法在 key1 以及 key2 我们是平等的。
更新
这个 map 使用上述减速机时应如下所示:

public void map(LongWritable key, Text val, Context context)
            throws IOException, InterruptedException {
        String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
        String[] keyVal = val.toString().split("\\s+");
        Text outputKey = new Text(keyVal[0]);
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("fileA".equals(fileName)) {
            outputValue.set("A" + keyVal[1]);
        } else {
            outputValue.set("B" + keyVal[1]);
        }
        context.write(outputKey, outputValue);
    }

相关问题