hadoop—提取一个远程zip文件并将其解压缩到java中的hdfs

pn9klfpd  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(477)

我所做的只是解压缩并上传一个zip文件,可以从网站下载到hdfs上。代码如下:

String src="http://corpus.byu.edu/wikitext-samples/text.zip";
String dst = "hdfs://cshadoop1/user/hxy162130/assignment1";
InputStream a = new URL(src).openStream();
System.out.println(a == null);
ZipInputStream in = new ZipInputStream(a);
System.out.println(in == null);
ZipEntry zE = in.getNextEntry();        
System.out.println(zE == null);

如您所见,我使用openstream方法将url更改为inputstream,然后使用inputstream作为zipinputstream的参数,最后从zipinputstream获得一个条目。但问题是getnextentry方法返回一个空值,这意味着我的代码的输出是false,false,true。我就是找不到问题所在。

vc9ivgsu

vc9ivgsu1#

对的http请求http://corpus.byu.edu/wikitext-samples/text.zip 结果是 301 Moved Permanently 给予新的 Location: https://corpus.byu.edu/wikitext-samples/text.zip . 所以没有一个 ZIP 使用此功能可获得的资源 URL .
要遵循重定向,可以执行以下操作:

import java.net.URL;
import java.net.URLConnection;
import java.io.InputStream;
import java.util.zip.*;

class ReadZipInputStream {

 public static void main(String[] args) throws Exception {

  String src="http://corpus.byu.edu/wikitext-samples/text.zip";
  //301 Moved Permanently: Location:https://corpus.byu.edu/wikitext-samples/text.zip

  URL url = new URL(src);
  URLConnection connection = url.openConnection();
  String redirect = connection.getHeaderField("Location");
  if (redirect != null){
   connection = new URL(redirect).openConnection();
  }

  InputStream a = connection.getInputStream();
  System.out.println(a);

  ZipInputStream in = new ZipInputStream(a);
  System.out.println(in);

  ZipEntry zE = in.getNextEntry();        
  System.out.println(zE);

 }
}

相关问题