我必须读取文件a的所有内容并将其传递给map函数。在map函数中,key是fileb,value是filea的内容。在outputformat recordreader中,我使用sequence file writer append方法将所有值(filea的所有内容)附加到fileb。问题是
1. I am loading all file contents in inputFormat recordReader and passing it to single map function.
2. Appending all contents in sequence file.
PseudoCode:
InputFormat RecordReader:
@Override
public boolean nextKeyValue() throws IOException, InterruptedException {
if(flag>0)
return false;
flag++;
String re=read all contents of file
String key= k1;
allRecords = new TextArrayWritable(Text.class, new Text[] {new Text(key),
new Text(re)});
return true;
}
@Override
public TextArrayWritable getCurrentValue() throws IOException, InterruptedException {
return allRecords;
}
Map Function:
protected void map(Text key, TextArrayWritable value,
Context context) throws IOException,
InterruptedException {
context.write(new Text(fileA path),value);
}
OutputFormat RecordWriter:
@Override
public void write(Text fileDir, TextArrayWritable contents) throws IOException,
InterruptedException {
SequenceFileWriter.append(contents.get()[0], contents.get()[1]);
}
这两个操作都是内存中的操作,如果文件太大,可能会抛出内存不足错误。有没有办法避免将整个内容加载到内存中,并将其附加到序列文件中?
暂无答案!
目前还没有任何答案,快来回答吧!