hadoop—将所有文件内容传递给map reduce中的map函数,并将其附加到序列文件中

sauutmhj  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(214)

我必须读取文件a的所有内容并将其传递给map函数。在map函数中,key是fileb,value是filea的内容。在outputformat recordreader中,我使用sequence file writer append方法将所有值(filea的所有内容)附加到fileb。问题是

1. I am loading all file contents in inputFormat recordReader and passing it to single map function.
 2. Appending all contents in sequence file.

PseudoCode:
InputFormat RecordReader:
@Override
  public boolean nextKeyValue() throws IOException, InterruptedException {

    if(flag>0)
      return false;

      flag++;

      String re=read all contents of file
      String key= k1;

      allRecords = new TextArrayWritable(Text.class, new Text[] {new Text(key),
                      new Text(re)});
      return true;
  }

@Override
  public TextArrayWritable getCurrentValue() throws IOException, InterruptedException {
    return allRecords;
  }

Map Function:

protected void map(Text key, TextArrayWritable value,
      Context context) throws IOException,
      InterruptedException {
    context.write(new Text(fileA path),value);
  }

OutputFormat RecordWriter:

@Override
    public void write(Text fileDir, TextArrayWritable contents) throws IOException,
        InterruptedException {
      SequenceFileWriter.append(contents.get()[0], contents.get()[1]);
}

这两个操作都是内存中的操作,如果文件太大,可能会抛出内存不足错误。有没有办法避免将整个内容加载到内存中,并将其附加到序列文件中?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题