继续获取indexoutofboundsexception hadoop mapreduce

kyxcudwk  于 2021-05-27  发布在  Hadoop
关注(0)|答案(0)|浏览(233)

我对hadoop和java还不熟悉。所以请容忍我。
我可以让mapreduce和我一起工作 .tsv 但似乎我无法让它工作 .csv 文件夹。
这主要是我的Map问题,我看不出问题是什么。

代码如下:

package question5;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class FreqMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

    @Override
    public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
        /*
         * When the file is inputed, the first line is read.
         * The first line in this case, are the headers, which we do not want.
         * Since the input is split into a key-pair structure, we only need to skip key 0.
         * As seen below.
         * */
        if(key.get()==0) {
            return;
        }else {
            /*
             * After skipping the first line, we extract the necessary data to be mapped into our desired
             * key-pair structure.
             * 
             * In this case, channel_title -> likes
             * channel_title being Text data type
             * likes being IntWritable data type
             * 
             * The data is split at the comma.
             * */
            String line = value.toString();
            Text channel_name = new Text(line.split(",")[3]);
            IntWritable likes = new IntWritable(Integer.parseInt(line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")[8]));
            context.write(channel_name, likes);
        }

    }

}

当我想访问索引8处的拆分数组时,问题出现在intwritable。发生indexoutofboundsexception。我测试了正则表达式,它运行良好,如图所示https://regex101.com/r/j3p6xq/1
任何建议都欢迎。谢谢你的阅读。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题