我正在尝试用python和hadoop流运行mapreduce代码。我必须导入nltk和textblob包作为我的reducer代码的一部分。mapper和reducer脚本在我的本地(windows)系统上执行得非常好,但是当我尝试使用hadoop流将它们发送到hdfs系统时,我遇到了不同的问题。
它说,在hdfs系统上找不到nltk和textblob包。使用这个链接,我遵循完全相同的步骤,但它给我下面提到的错误。
回溯(最近一次调用last):文件“”,第75行,在初始化文件“”中,第87行,在\u path\u stat filenotfounderror:[winerror 3]系统找不到指定的路径:“”
在处理上述异常时,发生了另一个异常:
回溯(最后一次调用):文件“c:\hadoop-3.2.0\temp\reducer\u movies.py”,第3行,在importer=zipimportx.zipimporter('nltk.mod')文件“c:\python38-32\lib\site packages\zipimportx\uu init\uu.py”,第190行,在init zipimporter.zipimporter.init(self,archivepath)文件“中,第81行,在init zipimporter.zipimporterror:不是zip文件
我尝试用.zip修改.mod,但遇到以下错误:
回溯(最后一次调用):文件“c:\hadoop-3.2.0\temp\reducer\u movies.py”,第3行,在importer=zipimportx.zipimporter('nltk.zip')文件“c:\python38-32\lib\site packages\zipimportx\uuu init\uu.py”,第190行,在init zipimporter.zipimporter.init(self,archivepath)文件“中,第96行,在init attributeerror:can't set attribute
我完全被困在继续前进的道路上。请看下面我的代码:
# !/usr/bin/env python
import sys
from operator import itemgetter
import zipimportx
importer = zipimportx.zipimporter('C:\hadoop-3.2.0\temp\nltk.zip')
nltk = importer.load_module('nltk')
from nltk.tokenize import word_tokenize
nltk.data.path += ["."]
importer = zipimportx.zipimporter('C:\hadoop-3.2.0\temp\textblob.zip')
textblob = importer.load_module('textblob')
from textblob import classifiers
from textblob import TextBlob
textblob.data.path += ["."]
current_word = None
current_count = 0
word = None
training = [
('The lion king is an animation movie which is the story of a cub and his wicked uncle','Animation'),
('Shlykov, a hard-working taxi driver and Lyosha, a saxophonist, develop a bizarre love-hate relationship, and despite their prejudices, realize they aren''t so different after all','Drama'),
('The nation of Panem consists of a wealthy Capitol and twelve poorer districts. As punishment for a past rebellion, each district must provide a boy and girl between the ages of 12 and 18 selected by lottery for the annual Hunger Games. The tributes must fight to the death in an arena;','Adventure'),
('Poovalli Induchoodan is sentenced for six years prison life for murdering his classmate. Induchoodan, the only son of Justice Maranchery Karunakara Menon was framed in the case by Manapally Madhavan Nambiar and his crony DYSP Sankaranarayanan to take revenge on idealist judge Menon who had','Action'),
('The Lemon Drop Kid , a New York City swindler, is illegally touting horses at a Florida racetrack. After several successful hustles, the Kid comes across a beautiful, but gullible, woman intending to bet a lot of money. The Kid convinces her to switch her bet, employing a prefabricated con.','Comedy'),
('Seventh-day Adventist Church pastor Michael Chamberlain, his wife Lindy, their two sons, and their nine-week-old daughter Azaria are on a camping holiday in the Outback. With the baby sleeping in their tent, the family is enjoying a barbecue with their fellow campers when a cry is heard.','Crime Fiction'),
('Why was Elsa born with magical powers? The answer is calling her and threatening her kingdom. Together with Anna, Kristoff, Olaf and Sven, she''ll set out on a dangerous but remarkable journey. In Frozen, Elsa feared her powers were too much for the world. In Frozen 2, she must hope they are enough.','Animation'),
('this movie is about the drama of panipat','Drama')
]
# print(training)
classifier = classifiers.NaiveBayesClassifier(training)
def process_text(text):
blob = TextBlob(text,classifier=classifier)
return blob.classify()
# input comes from STDIN
for line in sys.stdin:
line = line.strip()
lines = line.split('--')
#print (lines)
for x in lines:
genre = process_text(x)
#print(genre)
print(x, genre) # This is for Python 3
暂无答案!
目前还没有任何答案,快来回答吧!