hadoop分布式缓存上的java存储树集

mrwjdhj3 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(400)

我正在试着储存一个 TreeSet 在 DistributedCache 供hadoop map reduce作业使用。到目前为止，我有以下关于将一个文件从hdfs添加到 DistributedCache :

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/my/cache/path"), conf);
Job job = new Job(conf, "my job");
// Proceed with remainder of Hadoop map-reduce job set-up and running

如何有效地将树集（我已经在这个类中构建）添加到我正在添加到distributedcache的这个文件中？我应该使用java的本机序列化来将其序列化到文件中吗？
请注意，treeset只在启动map reduce作业的主类中构建一次。树集将永远不会被修改，我只是希望每个Map程序都有只读访问此树集，而不必一遍又一遍地重建它。

Java hadoop mapreduce serialization distributed-cache

来源：https://stackoverflow.com/questions/16136842/store-treeset-on-hadoop-distributedcache