在mapreduce中重新创建python字典结果?

niknxzdl  于 2021-06-01  发布在  Hadoop
关注(0)|答案(0)|浏览(247)

我不明白为什么标准python代码在使用mrjob转换为mapreduce时会产生意外的结果。
来自.txt文件的示例数据:

1  12
1  14
1  15
1  16
1  18
1  12
2  11
2  11
2  13
3  12
3  15
3  11
3  10

此代码创建字典并执行简单的除法计算:

dic = {}

with open('numbers.txt', 'r') as fi:
    for line in fi:
        parts = line.split()
        dic.setdefault(parts[0],[]).append(int(parts[1]))

print(dic)

for k, v in dic.items():
    print (k, 1/len(v), v)

结果:

{'1': [12, 14, 15, 16, 18, 12], '2': [11, 11, 13], '3': [12, 15, 11, 10]}

1 0.16666666666666666 [12, 14, 15, 16, 18, 12]
2 0.3333333333333333 [11, 11, 13]
3 0.25 [12, 15, 11, 10]

但当使用mrjob转换为mapreduce时:

from mrjob.job import MRJob
from mrjob.step import MRStep
from collections import defaultdict

class test(MRJob):

    def steps(self):
        return [MRStep(mapper=self.divided_vals)]

    def divided_vals(self, _, line):

        dic = {}
        parts = line.split() 
        dic.setdefault(parts[0],[]).append(int(parts[1]))

        for k, v in dic.items():
            yield (k, 1/len(v)), v 

if __name__ == '__main__': 
    test.run()

结果:

["2", 1.0]  [11]
["2", 1.0]  [13]
["3", 1.0]  [12]
["3", 1.0]  [15]
["3", 1.0]  [11]
["3", 1.0]  [10]
["1", 1.0]  [12]
["1", 1.0]  [14]
["1", 1.0]  [15]
["1", 1.0]  [16]
["1", 1.0]  [18]
["1", 1.0]  [12]
["2", 1.0]  [11]

为什么mapreduce不以相同的方式分组和计算?如何在mapreduce中重新创建标准python结果?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题