hadoopmapreduce在reducer中访问Map器输出数

cxfofazt 于 2021-05-29 发布在 Hadoop

关注(0)|答案(3)|浏览(473)

我有一个Map器，它输出一个句子中的每个字母，这是键，数字1是它的值。例如，我的mapper输出“how are you”作为

H 1
o 1
w 1
a 1
r 1
e 1
y 1
o 1
u 1

我的减速机使用1来计算每个字母的出现次数。例如，它将输出字母“o”作为键，2作为其值，因为它出现两次。
我的问题是我想计算一个句子中每个字母出现的频率。为此，我需要访问语句中的字母总数（Map器输出数）。我是mapreduce的新手，所以我不确定最好的方法。

Java hadoop mapreduce reducers Mapper

来源：https://stackoverflow.com/questions/48934671/hadoop-mapreduce-access-mapper-output-number-in-reducer

3条答案

按热度按时间

ffx8fchx1#

我自己解决的：使用全局计数器访问map\u output\u记录，得到reducer中mapper输出的总数。
代码：

Configuration conf = context.getConfiguration();
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
long totalCharacters = currentJob.getCounters().findCounter(TaskCounter.MAP_OUTPUT_RECORDS).getValue();

赞(0）回复(0）举报 2021-05-29

qgzx9mmu2#

假设您的Map器得到了一个完整的句子，其中您正在尝试查找频率，并且您正在使用javaapi，那么您可以通过从Map器输出两个键 context.write(...) 功能：
Map器的java语法： public void map(LongWritable key, Text value, Context context) 密钥： <lineNo_Letter> ; 价值： c_m 密钥： <lineNo_Letter> ; 价值： t_n 哪里

lineNo = same as key to the mapper (the first parameter to the above function)
letter = your desired letter
m = <total number of letters in the line (the 2nd parameter to the above function) input to the mapper>
n = <number of occurrence of letter in the line (the 2nd parameter to the above function) mapper input line>
``` `c_` 以及 `a_` 只是识别计数类型的前缀。 `c` 表示字母出现的次数；而 `t` 表示发生的总数。
基本上，这里我们利用的概念是，您可以从mapper/reducer中编写任意多的键值。
现在减速机会得到类似钥匙的东西： `<lineNo_letter>` 价值： `ListOf[c_m, t_n]` 现在，只需迭代列表，用分隔符将其拆分 `_` 在标识符前缀的帮助下( `t` 以及 `c` ); 在减速器中有所需的值。即

Total number of letter in the sentence = m
Total number of occurrence of the letter = n


### 编辑：添加psuedo逻辑

以您的示例为例，假设mapper函数的输入行 `public void map(LongWritable key, Text value, Context context)` 是

LongWritable key = 1
Text value = howareyou

Map器的输出应为：

-- Output length of the Text Value against each letter
context.write("1_h", "t_9");
context.write("1_o", "t_9");
context.write("1_w", "t_9");
context.write("1_a", "t_9");
context.write("1_r", "t_9");
context.write("1_e", "t_9");
context.write("1_y", "t_9");
context.write("1_u", "t_9");

请注意，上面的输出是来自Map器的句子的每个字母一次。这就是为什么这封信 `o` 仅输出一次（即使在输入中出现两次）。
Map程序代码的更多输出将是

-- Output individual letter count in the input text as
context.write("1_h", "c_1");
context.write("1_o", "c_2");
context.write("1_w", "c_1");
context.write("1_a", "c_1");
context.write("1_r", "c_1");
context.write("1_e", "c_1");
context.write("1_y", "c_1");
context.write("1_u", "c_1");

同样，你可以看到这封信 `o` 是有价值的 `c_2` 因为它在句子中出现了两次。
现在将产生8个reducer，每个reducer将获得以下一个键值对：

key: "1_h" value: ListOf["t_9", "c_1"]
key: "1_o" value: ListOf["t_9", "c_2"]
key: "1_w" value: ListOf["t_9", "c_1"]
key: "1_a" value: ListOf["t_9", "c_1"]
key: "1_r" value: ListOf["t_9", "c_1"]
key: "1_e" value: ListOf["t_9", "c_1"]
key: "1_y" value: ListOf["t_9", "c_1"]
key: "1_u" value: ListOf["t_9", "c_1"]

现在在每个减速机中，拆分键以获得行号和字母。遍历值列表以提取出现的总数和字母。
字母频率 `h` 第1行= `Integer.parseInt("c_1".split("_")[1])/Integer.parseInt("t_9".split("_")[1])` 这是您要实现的伪逻辑。

赞(0）回复(0）举报 2021-05-29

nwlqm0z13#

不要一看到信就马上写。数一数所有的字符，然后把总数和字符一起写下来。
然后根据您编写值的方式，您的reducer将看到

o, [(1,9), (1,9)]

求1的和，从9中抽取任意一个，然后除以

赞(0）回复(0）举报 2021-05-29

我来回答

hadoopmapreduce在reducer中访问Map器输出数

3条答案

相关问题

热门标签

最新问答