我想把几个小的bzip2文件合并成一个序列文件。我看到了一个创建序列文件的代码并尝试了它。但它给出了如下奇怪的输出。这是因为它无法读取bzip2文件吗?
SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text
�*org.apache.hadoop.io.compress.DefaultCodec����gWŒ‚ÊO≈îbº¡vœÖ��� ���
.DS_StorexúÌò±
¬0EÔ4.S∫a�6∞¢0P∞=0ì·‡/d)ÄDï˛ì¨w≈ù7÷ùØ›⁄ÖüO;≥X¬`’∂µóÆ Æâ¡=Ñ B±lP6Û˛ÜbÅå˜C¢3}ª‘�Lp¥oä"ùËL?jK�&:⁄”Åét¢3]Î
º∑¿˘¸68§ÄÉùø:µ√™*é-¿fifi>!~¯·0Ùˆú ¶ eõ¯c‡ÍÉa◊':”ÍÑòù;I1•�∂©���00.json.bz2xúL\gWTK∞%
,Y
ä( HJFêúsŒ\PrRrŒ9ÁCŒ9√0ÃZUÏÌÊΩÔ≤Ù‚Ãô”’UªvÌÍÓ3£oˆä2ä<˝”-”ãȧπË/d;u¥Û£üV;ÀÒÛ¯Ú˜ˇ˚…≥2¢5Í0‰˝8M⁄,S¸¢`f•†`O<ëüD£≈tÃ¥ó`•´D˚~aº˝«õ˜v'≠)(F|§fiÆÕ ?y¬àœTÒÊYåb…U%E?⁄§efiWˇÒY#üÛÓÓ‚
⁄è„ÍåÚÊU5‡ æ‚Â?q‘°�À{©?íWyü÷ÈûF<[˘éŒhãd>x_ÅÁ
fiÒ_eâ5-—|-M)˙)¸R·ªCÆßs„F>UŒ©ß{o„uÔ&∫˚˚Ÿ?Ä©ßW,”◊Ê∫â«õxã¸[yûgÈñFmx|‡ªÍ¶”¶‡Óp-∆ú§ı
<JN t «F4™@Àä¥Jœ¥‰√|E„‘œ„&º§@g|ˆá{iõOx
代码是
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.GenericOptionsParser;
public class cinput {
/**
* @param args
* @throws IOException
* @throws IllegalAccessException
* @throws InstantiationException
*/
public static void main(String[] args) throws IOException,
InstantiationException, IllegalAccessException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
Path inputFile = new Path(otherArgs[0]);
Path outputFile = new Path(otherArgs[1]);
FSDataInputStream inputStream;
Text key = new Text();
Text value = new Text();
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,
outputFile, key.getClass(), value.getClass());
FileStatus[] fStatus = fs.listStatus(inputFile);
for (FileStatus fst : fStatus) {
String str = "";
System.out.println("Processing file : " + fst.getPath().getName() + " and the size is : " + fst.getPath().getName().length());
inputStream = fs.open(fst.getPath());
key.set(fst.getPath().getName());
while(inputStream.available()>0) {
str = str+inputStream.readLine();
// System.out.println(str);
}
value.set(str);
writer.append(key, value);
}
fs.close();
IOUtils.closeStream(writer);
System.out.println("SEQUENCE FILE CREATED SUCCESSFULLY........");
}
}
我传递的输入是json.bzip2文件。有人能指出为什么我得到奇怪的输出吗。
暂无答案!
目前还没有任何答案,快来回答吧!