yarn中的分布式缓存特性

mpbci0fu 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(470)

目前我正在使用map-reduce框架。在伪分布式模式下使用hadoop。我想在这里使用“分布式缓存”功能添加一些文件到缓存，并在我的Map功能中使用它。我怎样才能做到这一点。

hadoop

来源：https://stackoverflow.com/questions/27185178/distributed-cache-feature-in-yarn

1条答案

按热度按时间

vngu2lb81#

如何将文件添加到分布式缓存：
使用hadoop选项
.

hadoop jar <application jar> <main class> <input> <output> -files <absolute path to distributed cache file>

使用分布式缓存api：
.

job.addCacheFile(uri);

hadoop-files选项或分布式缓存api将缓存文件复制到所有任务节点，并使其在执行期间可供mapper/reducer使用。
如何访问分布式缓存：
重写mapper/reducer中的setup方法并从上下文调用getcachefiles。示例代码如下：

@Override
    protected void setup(Context context)
            throws IOException, InterruptedException {

        Path[] localPaths = context.getCacheFiles();
        if (localPaths.length == 0) {
            throw new FileNotFoundException("Distributed cache file not found.");
        }
        File localFile = new File(localPaths[0].toString());
        // code to process cache file

    }

方法返回配置中设置的文件的uri数组。

赞(0）回复(0）举报 2021-05-30

我来回答

yarn中的分布式缓存特性

1条答案

相关问题

热门标签

最新问答