c#hdinsight mapreduce将参数传递给Map器

bxpogfeg  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(347)

要执行mapreduce,必须传递mapper和reducer/combiner类型,因此它们必须具有无参数构造函数。因此,您无法通过构造函数或Map器方法将任何属性注入Map器或还原器实体?
我试图避免创建多个Map器,这些Map器只在json字符串输入中查找不同的属性时执行完全相同的操作。
摘自msdn博客以供说明。Map器将我们假定为json字符串的inputline转换为对象。拿出“某个属性”来Map。这里的问题是我们如何注入“some属性”,这样我们就可以控制Map器的行为,而不必创建Map器的多个实现。

public class MySimpleMapper : MapperBase

{

    public override void Map(string inputLine, MapperContext context)

    {

        //interpret the incoming line as an integer value

        SomeObject obj = JsonConvert.Serialize<SomeObject>(inputLine);
        int value = obj.Properties["some property"];

        //determine whether value is even or odd

        string key = (value % 2 == 0) ? “even” : “odd”;

        //output key assignment with value

        context.EmitKeyValue(key, value.ToString());

    }

}

从Map器获取Map对象的reducer类。

public class MySimpleReducer : ReducerCombinerBase

{

    public override void Reduce(

        string key, IEnumerable<string> values, ReducerCombinerContext context

        )

    {

        //initialize counters

        int myCount = 0;

        int mySum = 0;

        //count and sum incoming values

        foreach (string value in values)

        {

            mySum += int.Parse(value);

            myCount++;

        }

        //output results

        context.EmitKeyValue(key, myCount + “t” + mySum);

    }

请注意,我们如何为它指定Map器和还原器的类型,因此需要一个无参数构造函数。

//output results

        context.EmitKeyValue(key, myCount + “t” + mySum);

    }

//establish job configuration

        HadoopJobConfiguration myConfig = new HadoopJobConfiguration();

        myConfig.InputPath = “/demo/simple/in”;

        myConfig.OutputFolder = “/demo/simple/out”;

        //connect to cluster

        Uri myUri = new Uri(“http://localhost”);

        string userName = “hadoop”;

        string passWord = null;

        IHadoop myCluster = Hadoop.Connect(myUri, userName, passWord);

        //execute mapreduce job

        MapReduceResult jobResult =

            myCluster.MapReduceJob.Execute<MySimpleMapper, MySimpleReducer>(myConfig);
qni6mghb

qni6mghb1#

可以使用上下文将变量传递到map/reduce类中。下面是一个通过powershell提交jar并读取javam/r程序中的密钥的示例
powershell提交:

$defines = @{"parser.batch.id" = $batchID}  
$jobDef = New-AzureHDInsightMapReduceJobDefinition -JarFile $jarPath -ClassName "com.microsoft.myclass" -Defines $defines -JobName "Parser" -StatusFolder "Parser/$batchID"
$job = Start-AzureHDInsightJob -Cluster $clusterName -JobDefinition $jobDef -verbose -Credential $clusterCred**

从你的m/r工作

conf = contex.getConfigutation();
batchID = conf.get(Keys.PARSER_BATCH_ID.getKey());

资源:http://www.andrewsmoll.com/3-hacks-for-hadoop-and-hdinsight-clusters/httpshttp://blogs.msdn.microsoft.com/mostlytrue/2014/04/10/merging-small-files-on-hdinsight/

相关问题