你好,我有这个文件http://aminer.org/lab-datasets/citation/citation-network1.zip 我需要找到那些只有两位作者的出版物的作者的名字,他们至少在其中一位上颠倒他们的名字。我做的Map是这样的:
package bigdatauom;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text keyAuthors = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer authorslinetok = new StringTokenizer(value.toString(), "#");
while (authorslinetok.hasMoreTokens()) {
String tempLine = authorslinetok.nextToken();
if (tempLine.charAt(0) == '@') {
tempLine = tempLine.substring(1);
StringTokenizer seperateAuthorsTok = new StringTokenizer(tempLine, ",");
ArrayList<String> authors = new ArrayList<String>();
while (seperateAuthorsTok.hasMoreTokens()) {
authors.add(seperateAuthorsTok.nextToken());
}
if (authors.size() == 2){
keyAuthors.set(tempLine);
context.write(keyAuthors, one);
}
}
}
}
}
我需要有两个减速机示例,并已在这个项目上工作了一个星期没有结果。如有任何建议,敬请提前告知!
暂无答案!
目前还没有任何答案,快来回答吧!