java在hadoop中使用mapreduce读取大文件

axkjgtzd 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(532)

我有一个从ftp服务器读取文件并将其写入 HDFS . 我实施了一个定制的 InputFormatReader 这就决定了 isSplitable 输入的属性为 false 。但是这给了我以下错误。

INFO mapred.MapTask: Record too large for in-memory buffer

我用来读取数据的代码是

Path file = fileSplit.getPath();
                FileSystem fs = file.getFileSystem(conf);
                FSDataInputStream in = null;
                try {
                    in = fs.open(file);

                    IOUtils.readFully(in, contents, 0, contents.length);

                    value.set(contents, 0, contents.length);

                }

有什么办法避免吗 java heap space error 不拆分输入文件？或者万一我
isSplitable true 我该怎么读文件呢？

Java hadoop mapreduce amazon-emr elastic-map-reduce

来源：https://stackoverflow.com/questions/14100116/reading-large-files-using-mapreduce-in-hadoop