尝试使用hadoop map reduce对文本进行处理以获取词性。没有错误,但Map函数仍不提供任何输出。
public class POSCount {
public static class TokenizerMapper
extends Mapper<LongWritable, Text, IntWritable, Text>{
//private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private Map<String, String> wordList = null;
@Override
public void setup(Context context) {
Configuration conf = context.getConfiguration();
Path pt = new Path("/user/gokul/hw1b/mobyposi.i");
//Path pt = new Path("/user/gxs161530/mobyposi.i");
BufferedReader br;
try {
//FileSystem fs = FileSystem.get(new Configuration());
FileSystem fs = FileSystem.get(conf);
br=new BufferedReader(new InputStreamReader(fs.open(pt)));
wordList = new HashMap<String, String>();
String line, word, type;
char ch;
while ((line=br.readLine())!= null){
word = line.substring(0,line.indexOf("×"));
type = line.substring(line.indexOf("×")+1);
for(int i=0;i<type.length();i++){
ch = type.charAt(i);
switch (ch){
case 'N' : wordList.put(word, "noun");
break;
case 'p' : wordList.put(word, "plural");
break;
case 'V' : wordList.put(word, "verb");
break;
case 't' : wordList.put(word, "verb");
break;
case 'i' : wordList.put(word, "verb");
break;
case 'A' : wordList.put(word, "adjective");
break;
case 'v' : wordList.put(word, "adverb");
break;
case 'C' : wordList.put(word, "conjunction");
break;
case 'P' : wordList.put(word, "preposition");
break;
case 'r' : wordList.put(word, "pronoun");
break;
case 'D' : wordList.put(word, "definite article");
break;
case 'I' : wordList.put(word, "indefinite article");
break;
case 'o' : wordList.put(word, "nominative");
break;
}
}
}
} catch(Exception e) {
e.printStackTrace();
}
// return map;
}
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
String token;
int len=0;
try {
while (itr.hasMoreTokens()) {
token = itr.nextToken().trim().toLowerCase();
len = token.length();
if(wordList.containsKey(token) && len>=5){
word.set(wordList.get(token));
//context.write(new Text(Integer.toString(len)), word);
context.write(new IntWritable(len), word);
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
以上程序将Map输出记录返回为0。我需要(长度,词性)作为Map的输出。请告诉我哪里出错了。
1条答案
按热度按时间mlmc2os51#
你的程序是好的,我运行它一次,对一些样本从文件中你提到的评论,它给了我预期的产出。但当我对整个文件运行同一个程序时,由于一些符号问题,它给了我一个错误。它无法以java支持的格式从输入文件中获取“x”。所以我把那个文件中的所有数据复制到新文件(stack.txt)中,然后再次运行这个程序,它给了我输出。
我使用了和你问题中给出的相同的Map器,这是我使用的还原器
这是我得到的结果
如果你还面临一些问题,请告诉我