为什么python不把stdin输入作为字典来读取？

dl5txlt9 于 2021-06-04 发布在 Hadoop

关注(0)|答案(1)|浏览(502)

我肯定我在做些蠢事，但这是我的错。我正在为我的udacity类“Mapreduce和hadoop的简介”做一个类作业。我们的任务是制作一个Map器/缩减器，它将统计数据集（论坛帖子主体）中某个单词的出现次数。我已经知道如何做到这一点，但我无法让python将stdin数据作为字典读入到reducer中。
到目前为止，我的方法是：mapper读取数据（在本例中是在代码中）并吐出一个word:count for 每个论坛帖子：


# !/usr/bin/python

import sys
import csv
import re
from collections import Counter

def mapper():
    reader = csv.reader(sys.stdin, delimiter='\t')
    writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

    for line in reader:
        body = line[4]
        #Counter(body)
        words = re.findall(r'\w+', body.lower())
        c = Counter(words)
        #print c.items()
        print dict(c)

test_text = """\"\"\t\"\"\t\"\"\t\"\"\t\"This is one sentence sentence\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"Also one sentence!\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"Hey!\nTwo sentences!\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"One. Two! Three?\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"One Period. Two Sentences\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"Three\nlines, one sentence\n\"\t\"\"
"""

# This function allows you to test the mapper with the provided test string

def main():
    import StringIO
    sys.stdin = StringIO.StringIO(test_text)
    mapper()
    sys.stdin = sys.__stdin__

if __name__ == "__main__":
    main()

然后，论坛帖子的数量会转到如下标准： {'this': 1, 'is': 1, 'one': 1, 'sentence': 2} 那么减速机应该把这个标准读入字典


# !/usr/bin/python

import sys
from collections import Counter, defaultdict
for line in sys.stdin.readlines():
    print dict(line)

但是失败了，给我一个错误信息： ValueError: dictionary update sequence element #0 has length 1; 2 is required 这意味着（如果我理解正确的话）它不是以dict的形式，而是以文本字符串的形式读取每一行。如何让python理解输入行是dict？我尝试过使用counter和defaultdict，但仍然有相同的问题，或者让它作为list的元素读入每个字符，这也不是我想要的。
理想情况下，我希望Map程序读入每一行的dict，然后添加下一行的值，这样在第二行之后的值就是 {'this':1,'is':1,'one':2,'sentence':3,'also':1} 等等。
谢谢，jr

hadoop mapreduce python Dictionary stdin

来源：https://stackoverflow.com/questions/25271254/why-wont-python-read-stdin-input-as-a-dictionary

1条答案

按热度按时间

pgvzfuti1#

多亏了@keyser，ast.literal\u eval（）方法才适合我。以下是我现在拥有的：


# !/usr/bin/python

import sys
from collections import Counter, defaultdict
import ast
lineDict = {}
c = Counter()
for line in sys.stdin.readlines():
    lineDict = ast.literal_eval(line)
    c.update(lineDict)
print c.most_common()

赞(0）回复(0）举报 2021-06-04

我来回答

为什么python不把stdin输入作为字典来读取？

1条答案

相关问题

热门标签

最新问答