使用hadoop mapreduce在文件上查找反向名称

v8wbuo2f  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(182)

你好,我有这个文件http://aminer.org/lab-datasets/citation/citation-network1.zip 我需要找到那些只有两位作者的出版物的作者的名字,他们至少在其中一位上颠倒他们的名字。我做的Map是这样的:

package bigdatauom;

    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.StringTokenizer;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

public class TokenizerMapper 
extends Mapper<Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text keyAuthors = new Text();

public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {    

    StringTokenizer authorslinetok = new StringTokenizer(value.toString(), "#");    

    while (authorslinetok.hasMoreTokens()) {
        String tempLine = authorslinetok.nextToken();

        if (tempLine.charAt(0) == '@') {
            tempLine = tempLine.substring(1);
            StringTokenizer seperateAuthorsTok = new StringTokenizer(tempLine, ",");
            ArrayList<String> authors = new ArrayList<String>();
            while (seperateAuthorsTok.hasMoreTokens()) {
                authors.add(seperateAuthorsTok.nextToken());
            }
            if (authors.size() == 2){
            keyAuthors.set(tempLine);
            context.write(keyAuthors, one);
            }
        }
    }

}
}

我需要有两个减速机示例,并已在这个项目上工作了一个星期没有结果。如有任何建议,敬请提前告知!

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题