java—如何使用新的api以编程方式获取hadoop集群中所有正在运行的作业？

olhwl3o2 于 2021-05-30 发布在 Hadoop

关注(0)|答案(2)|浏览(348)

我有一个软件组件，可以将乔布斯提交给hadoop。我现在想在提交之前检查是否有其他作业正在运行。我发现有一个 Cluster 对象，可以用来查询集群中运行的作业，获取它们的配置并从中提取相关信息。但是我用这个有问题。
只是在做 new Cluster(conf) 哪里 conf 是有效的 Configuration 可用于访问此集群（例如，向其提交作业）的将使对象未配置，并且 getAllJobStatuses() 方法 Cluster 退货 null .
提取 mapreduce.jobtracker.address 从结构上，构造 InetSocketAddress 并使用 Cluster 投掷 Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. .
使用旧的api new JobClient(conf).getAllJobs() 抛出一个npe。
我错过了什么？如何以编程方式获取正在运行的作业？

Java hadoop

来源：https://stackoverflow.com/questions/29644998/how-to-programmatically-get-all-running-jobs-in-a-hadoop-cluster-using-the-new-a

2条答案

按热度按时间

6ojccjat1#

我试过这样做，它为我工作，但它是提交后的工作

JobClient jc = new JobClient(job.getConfiguration());

  for(JobStatus js: jc.getAllJobs())
  {
    if(js.getState().getValue() == State.RUNNING.getValue())
    {

    }
  }

  jc.close();

或者我们可以从job api得到集群，并且有一些方法返回所有的job，jobs status

cluster.getAllJobStatuses();

赞(0）回复(0）举报 2021-05-30

bf1o4zei2#

我调查了更多，我解决了它。托马斯·荣布吕特是对的，那是因为微型星系团。我在这篇博文后面使用了mini-cluster，结果证明它对jobs先生有用，但是用一种不推荐的方式设置mini-cluster，配置不完整。hadoopwiki有一个关于如何开发单元测试的页面，它还解释了如何正确地设置一个小型集群。
基本上，我通过以下方式进行小型集群设置：

// Create a YarnConfiguration for bootstrapping the minicluster
final YarnConfiguration bootConf = new YarnConfiguration();
// Base directory to store HDFS data in
final File hdfsBase = Files.createTempDirectory("temp-hdfs-").toFile();
bootConf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, hdfsBase.getAbsolutePath());
// Start Mini DFS cluster
final MiniDFSCluster hdfsCluster = new MiniDFSCluster.Builder(bootConf).build();
// Configure and start Mini MR YARN cluster
bootConf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 64);
bootConf.setClass(YarnConfiguration.RM_SCHEDULER, FifoScheduler.class, ResourceScheduler.class);
final MiniMRYarnCluster yarnCluster = new MiniMRYarnCluster("test-cluster", 1);
yarnCluster.init(bootConf);
yarnCluster.start();
// Get the "real" Configuration to use from now on
final Configuration conf = yarnCluster.getConfig();
// Get the filesystem
final FileSystem fs = new Path ("hdfs://localhost:" + hdfsCluster.getNameNodePort() + "/").getFileSystem(conf);

现在，我有了 conf 以及 fs 我可以使用提交作业和访问hdfs，以及 new Cluster(conf) 以及 cluster.getAllJobStatuses 按预期工作。
当一切都完成后，要关闭和清理，我打电话给：

yarnCluster.stop();
hdfsCluster.shutdown();
FileUtils.deleteDirectory(hdfsBase); // from Apache Commons IO

注： JAVA_HOME 必须设置此项才能工作。在Jenkins的基础上，确保 JAVA_HOME 为默认jdk设置。或者，您可以显式地声明要使用的jdk，jenkins将设置 JAVA_HOME 自动地。

赞(0）回复(0）举报 2021-05-30

我来回答

java—如何使用新的api以编程方式获取hadoop集群中所有正在运行的作业？

2条答案

相关问题

热门标签

最新问答