mapreduce.sdk:如何等待mapreduce作业?

2eafrhcq  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(387)

我正在使用microsoft mapreduce sdk启动一个仅Map程序的作业。
呼叫 hadoop.MapReduceJob.ExecuteJob 正在立即引发“response status code does not indicate success:404(not found)”异常。
当检查hdinsight查询控制台时,作业将成功启动并在稍后完成。它还编写适当的输出文件。
我的猜测是,executejob试图在作业完成之前访问输出数据。
处理这种情况的正确方法是什么?

using System;
using System.Linq;
using System.Security.Cryptography.X509Certificates;
using Microsoft.WindowsAzure.Management.HDInsight;
using Microsoft.Hadoop.MapReduce;
using AzureAnalyzer.MultiAnalyzer;

namespace AzureAnalyzer
{
    class Program
    {
        static void Main(string[] args)
        {
            IHadoop hadoop = Hadoop.Connect(Constants.azureClusterUri, Constants.clusterUser,
            Constants.hadoopUser, Constants.clusterPassword, Constants.storageAccount,
            Constants.storageAccountKey, Constants.container, true);

            try {
                var output = hadoop.MapReduceJob.ExecuteJob<MultiAnalyzerJob>();
            }
            catch (Exception ex) 
            {
                Console.WriteLine("\nException: " + ex.Message);
            }
        }  
    }
}
slmsl1lt

slmsl1lt1#

我找到了另一种方法来做同样的事情,但是需要付出一些努力,因为它需要将mapper和reducer文件传输到hadoop集群存储。
您需要添加microsoft.hadoop.client,然后再添加microsoft azure hdinsight nuget包。

var jobcred = new BasicAuthCredential();
        jobcred.UserName = "clusteruserid";
        jobcred.Password = "clusterpassword";
        jobcred.Server = new Uri("https://clusterurl");

 StreamingMapReduceJobCreateParameters jobpara = new StreamingMapReduceJobCreateParameters()
        {
            JobName="mapreduce",
            Mapper = "Mapper.exe",
            Reducer = "Reducer.exe",
            Input= "wasb:///mydata/input",
            Output = "wasb:///mydata/Output",
            StatusFolder= "wasb:///mydata/sOutput"

        };
        jobpara.Files.Add("wasb:///mydata/Mapper.exe");
        jobpara.Files.Add("wasb:///mydata/Reducer.exe");

 // Create a Hadoop client to connect to HDInsight.
        var jobClient = JobSubmissionClientFactory.Connect(jobcred);

        // Run the MapReduce job.
        JobCreationResults mrJobResults = jobClient.CreateStreamingJob(jobpara);

        // Wait for the job to complete.
        Console.Write("Job running...");
        JobDetails jobInProgress = jobClient.GetJob(mrJobResults.JobId);
        while (jobInProgress.StatusCode != JobStatusCode.Completed
          && jobInProgress.StatusCode != JobStatusCode.Failed)
        {
            Console.Write(".");
            jobInProgress = jobClient.GetJob(jobInProgress.JobId);
            Thread.Sleep(TimeSpan.FromSeconds(10));
        }
        // Job is complete.
        Console.WriteLine("!");
        Console.WriteLine("Job complete!");
        Console.WriteLine("Press a key to end.");
        Console.Read();

希望这有帮助。我能够运行作业而不抛出任何异常。
这实际上是在等待作业完成。

pvcm50d1

pvcm50d12#

请验证运行程序所需的所有服务是否都已启动并正在运行。404错误表示程序试图在内部访问的某个url不可访问。

相关问题