如何将这个python逗号分隔的字符串列表分隔成多个键值对？

bxgwgixi 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(508)

如果我有表格的输入( (0, 1), A 0 0.0), ((0, 2), A 0 0.0), ((0, 0), A 0 0.0) 哪里 (0,1) , (0,2), (0,0) 是键，如何将它们分隔为多个键值对。
例如，如果我想打印键，上面的值应该返回：

(0,1) , A 0 0 0.0
(0,2) , A 0 0 0.0
(0,0) , A 0 0 0.0

此输出将用于my reducer函数，其代码为：

import sys
import string
import numpy
import re

# number of columns of A/rows of B

n = int(sys.argv[1])

# Create data structures to hold the current row/column values (if needed; your code goes here)

currentkey = None

# input comes from STDIN (stream data that goes to the program)

for line in sys.stdin:
    #print(line)
    #Remove leading and trailing whitespace
    #line = line.strip().replace("(","").replace(")","")
    #re.sub(r"[\(\[].*?[\)\]]", "", line)
    #line = line.strip().translate(None, "()")
    line = line.strip()

    #''.join(line.translate(string.maketrans("()[]"," "*4)).split(' ')[::2])
    print(line)

    #print(line.__class__)

    #Get key/value
    key, value = line.split('\t',1)

    print ("key: " + str(key))
    print ("Value: " + str(value))
    #Parse key/value input (your code goes here)
    # for val in value:
    #   if val[0] == "A":
    #       list_a.append(val)
    #       print(list_a)
    #
    #
    #   else:
    #       list_b.append(val)
    #       print(list_b)

    #If we are still on the same key...
    if key==currentkey:

        #Process key/value pair (your code goes here)
        for a in list_a:
            #remove first two elems so that we're left with value
            a = a[2:]
        print(list_a)
        result_a = list(map(int,result_a))
        for b in list_b:
            b = b[2:]
        print(list_b)
        result_b = list(map(int, result_b))
        #multiply result_a and result_b for current key
        result_ab = [a*b for a,b in zip(result_a,result_b)]
        finalResult = sum(result_ab)

hadoop mapreduce python matrix-multiplication

来源：https://stackoverflow.com/questions/42044775/how-to-separate-this-python-comma-separated-string-list-in-to-multiple-key-value

2条答案

按热度按时间

8tntrjer1#

基本上你可以从元组中提取值 ((0, 1), A 0 0.0) 是一个元组，您可以通过 tuple[0] , tuple[1] 等等。。。请参阅此处的更多示例
在这里 line[0] ( (0,1) )是另一个元组，所以我们需要将其转换为 str 为了输出最终结果 finalData = dataRDD.map(lambda line : str(line[0]) + "," + line[1]) 测试：

>>> data = [((0, 1), 'A 0 0.0'), ((0, 2), 'A 0 0.0'), ((0, 0), 'A 0 0.0')]
>>> dataRDD = sc.parallelize(data)
>>> for i in dataRDD.collect():
...     print(i)
... 
((0, 1), 'A 0 0.0')
((0, 2), 'A 0 0.0')
((0, 0), 'A 0 0.0')
>>> finalData = dataRDD.map(lambda line : str(line[0]) + "," + line[1])
>>> for i in finalData.collect():
...     print(i)
... 
(0, 1),A 0 0.0
(0, 2),A 0 0.0
(0, 0),A 0 0.0
>>> finalData.saveAsTextFile('/user/cloudera/test123')
----------
$ hadoop fs -cat /user/cloudera/test123/*
(0, 1),A 0 0.0
(0, 2),A 0 0.0
(0, 0),A 0 0.0

赞(0）回复(0）举报 2021-06-02

xhv8bpkk2#

您可以解决这个问题，我们可以使用简单的python解包。

tup = ((0, 1), 'A 0 0.0'), ((0, 2), 'A 0 0.0'), ((0, 0), 'A 0 0.0')
A = []
B = []
for each in tup:
    (x,y),z = each
    A.append((x,y))
    B.append(z)

赞(0）回复(0）举报 2021-06-02

我来回答

如何将这个python逗号分隔的字符串列表分隔成多个键值对？

2条答案

相关问题

热门标签

最新问答