我想用mapreduce来处理python中的矩阵乘法和hadoop。目标是计算a*b。输出应与输入相似。
输入两个矩阵a和b的格式如下:
A,0,0,0.0
A,0,1,1.0
...
A,1,3,8.0
A,1,4,9.0
B,0,0,0.0
B,0,1,1.0
...
B,4,0,12.0
B,4,1,13.0
a、 0,0,0.0表示索引为a(0,0),值为0.0,与b相同。
这是我的Map功能:
import sys
import string
import numpy
# Split line into array of entry data
entry = line.split(",")
# Set row, column, and value for this entry
row = int(entry[1])
col = int(entry[2])
value = float(entry[3])
# If this is an entry in matrix A...
if (entry[0] == "A"):
#Generate the necessary key-value pairs
for i in range(col):
print('<{}{},{} {} {}}>'.format(row,i,A,col,value))
# Otherwise, if this is an entry in matrix B...
else:
#Generate the necessary key-value pairs
for i in range(row):
print('<{}{},{} {} {}}>'.format(i,col,B,row,value))
我想知道如何写reduce函数。以下是我将使用的框架:
import sys
import string
import numpy
# number of columns of A/rows of B
n = int(sys.argv[1])
# Create data structures to hold the current row/column values (if needed; your code goes here)
currentkey = None
# input comes from STDIN (stream data that goes to the program)
for line in sys.stdin:
#Remove leading and trailing whitespace
line = line.strip()
#Get key/value
key, value = line.split('\t',1)
#Parse key/value input (your code goes here)
#If we are still on the same key...
if key==currentkey:
#Process key/value pair (your code goes here)
#Otherwise, if this is a new key...
else:
#If this is a new key and not the first key we've seen
if currentkey:
#compute/output result to STDOUT (your code goes here)
currentkey = key
#Process input for new key (your code goes here)
# Compute/output result for the last key (your code goes here)
为了运行这两个函数,我将使用一个小的测试数据集对它们进行测试,代码如下:
cat smalltest.txt | python src/map.py 2 3 | sort -n | python src/reduce.py 5
map给出的输出,然后使用 sort -n
为了对键进行排序,所以我将使用减速机来处理矩阵的计算。我的困惑是写减速机函数。
2条答案
按热度按时间uhry853o1#
不知道为什么要减少
我的
numpy
接近(用一些弦乐/列表/拉链体操)现在我们可以得到DIM,数组a,b的数据
lg40wkob2#
好吧,我直截了当地说,
如果lambda函数本身没有语法输入,就好像字符串不存在一样,这是你应该感谢的,这段代码(不是很苛刻)很混乱,所以我不明白你为什么要抱怨auto reduce。。