hadoop—使用java将文件从ftp下载到本地会导致文件无法读取—编码问题

n9vozmp4 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(537)

我开发了一个代码，可以从ftp读取非常大的文件，并使用java将其写入本地机器。执行此操作的代码如下所示。这是来自 next(Text key, Text value) 内部 RecordReader 的 CustomInputFormat ```
if(!processed)
{
System.out.println("in processed");
in = fs.open(file);
processed=true;
}
while(bytesRead <= fileSize) {

             byte buf[] = new byte[1024]; 

            try {
                in.read(buf);
                in.skip(1024);
                bytesRead+=1024;
                long diff = fileSize-bytesRead;
                if(diff<1024)
                {
                    break;
                }
    value.set(buf, 0, 1024); // This is where the value of the record is set and it goes to the mapper . 
            } 
            catch(Exception e)
            {
                e.printStackTrace();
            }

        }
        if(diff<1024)
        {
            int difference= (int) (fileSize-bytesRead);

             byte buf[] = new byte[difference]; 
            in.read(buf);
            bytesRead+=difference;
        }

                System.out.println("closing stream");
                in.close();

写操作结束后，我看到传输完成了，目标位置的文件大小与源位置的文件大小相同。但我无法打开文件，编辑器给出的错误如下

gedit has not been able to detect the character coding.
Please check that you are not trying to open a binary file.
Select a character coding from the menu and try again.

这个问题：使用jakartaftpwrapper的java upload jpg-使文件不可读与我的有关，我相信，但我无法理解它。
有什么建议吗？

Java hadoop amazon-emr ftp elastic-map-reduce

来源：https://stackoverflow.com/questions/14117719/downloading-files-from-ftp-to-local-using-java-makes-the-file-unreadable-encod

2条答案

按热度按时间

pgky5nke1#

我发现你的代码有很多问题。读取整个文件是一种奇怪的方式。例如：

in.read(buf);
in.skip(1024);
bytesRead+=1024;

是错误的， in.read(buf) 返回读取的字节数，并将流位置设置为当前位置old position+n read bytes。所以你不需要 skip -这是一个错误，因为read已经定位了流。
验证文件的校验和以确保它们是相同的(使用md5之类的）我很确定校验和和和文件大小都不一样。
您应该使用apachecommons io进行文件处理。否则，请查看有关文件处理的oracle文档。

赞(0）回复(0）举报 2021-06-04

x8diyxa72#

你的复制代码是完整的，完全是100%的a级废话。在java中复制流的规范方法如下：

int count;
byte[] buffer = new byte[8192]; // or more if you like
while ((count = in.read(buffer)) > 0)
{
  out.write(buffer, 0, count);
}

把其他的绒毛都去掉。这只是浪费时间和空间，显然会在传输过程中损坏数据。

赞(0）回复(0）举报 2021-06-04

我来回答

hadoop—使用java将文件从ftp下载到本地会导致文件无法读取—编码问题

2条答案

相关问题

热门标签

最新问答