我正在本地创建一个序列化树Map,并将其传输到hadoop2.7.3上的hdfs目录,如下所示。
// Create tree map and seralize to local folder
String localPath = "/home/test/file"
FileOutputStream fos = new FileOutputStream(localPath);
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(treemap);
oos.close();
fos.close();
// Transfer serialzied file to HDFS
Path hdfsPath = new Path("hdfs://localhost:54310/user/filename");
FileSystem fs = path.getFileSystem(conf);
InputStream in = new BufferedInputStream(new FileInputStream(localPath));
OutputStream out = fs.create(hdfsPath);
IOUtils.copyBytes(in, out, 4096, true);
// delete local copy
File f = new File(localPath);
f.delete();
序列化文件的大小为17 mb。然后我重新读取序列化文件并在稍后反序列化它。
String filename = "hdfs://localhost:54310/user/filename"
FileSystem fs = FileSystem.get(uri, conf);
InputStream in = fs.open(new Path(filename));
ObjectInputStream objReader = new ObjectInputStream(in);
map = (TreeMap) objReader.readObject();
此反序列化操作本身需要40秒,而程序的其余部分只需4-6秒即可运行。我试着在本地而不是在hdfs文件上运行反序列化,它在几秒钟内就完成了。你知道为什么hdfs要花这么长时间吗?
暂无答案!
目前还没有任何答案,快来回答吧!