读取主函数内的文件-hadoop

yk9xbfzb 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(628)

我正在尝试用hadoop作业的主方法读取一个文件。不在Map器或缩小器中。我正在使用一个自定义jar的emr amazon

The command line is arguments: -files s3://[path]#source.xml

在我正在做的主要功能中：

File file = new File("source.xml")

我不知道分布式缓存是在main函数中可用，还是仅仅在mapper/reducer函数中可用。我需要使用distributedcache api吗？
aws正在执行的行代码：

hadoop jar /mnt/var/lib/hadoop/steps/s-1YBXTPYJ2YK44/JobTeste_SomenteLeitura.jar -files s3://stoneagebrasil/TesteBVS/sources.xml

如何才能做到这一点？

Java hadoop amazon-emr distributed-cache emr

来源：https://stackoverflow.com/questions/31947810/reading-file-inside-main-function-hadoop

2条答案

按热度按时间

unguejic1#

到目前为止，我发现在hadoop驱动程序（main函数）内的分布式缓存中读取文件是不可能的。这是因为在我启动作业之后，文件将被分发（复制到从属节点）。
解决方案是直接从s3读取文件。

赞(0）回复(0）举报 2021-06-02

mo49yndu2#

尝试，

FileSystem fs = FileSystem.get(configuration);
Path path = new Path("test.txt");

要读取文件，

BufferedReader br = new BufferedReader(new InputStreamReader(
                fs.open(path)));
        String line;
        line = br.readLine();
        while (line != null) {
            System.out.println(line);
            line = br.readLine();
        }

赞(0）回复(0）举报 2021-06-02

我来回答

读取主函数内的文件-hadoop

2条答案

相关问题

热门标签

最新问答