我对hadoop和java还不熟悉。所以请容忍我。
我可以让mapreduce和我一起工作 .tsv
但似乎我无法让它工作 .csv
文件夹。
这主要是我的Map问题,我看不出问题是什么。
代码如下:
package question5;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class FreqMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
@Override
public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
/*
* When the file is inputed, the first line is read.
* The first line in this case, are the headers, which we do not want.
* Since the input is split into a key-pair structure, we only need to skip key 0.
* As seen below.
* */
if(key.get()==0) {
return;
}else {
/*
* After skipping the first line, we extract the necessary data to be mapped into our desired
* key-pair structure.
*
* In this case, channel_title -> likes
* channel_title being Text data type
* likes being IntWritable data type
*
* The data is split at the comma.
* */
String line = value.toString();
Text channel_name = new Text(line.split(",")[3]);
IntWritable likes = new IntWritable(Integer.parseInt(line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")[8]));
context.write(channel_name, likes);
}
}
}
当我想访问索引8处的拆分数组时,问题出现在intwritable。发生indexoutofboundsexception。我测试了正则表达式,它运行良好,如图所示https://regex101.com/r/j3p6xq/1
任何建议都欢迎。谢谢你的阅读。
暂无答案!
目前还没有任何答案,快来回答吧!