要执行mapreduce,必须传递mapper和reducer/combiner类型,因此它们必须具有无参数构造函数。因此,您无法通过构造函数或Map器方法将任何属性注入Map器或还原器实体?
我试图避免创建多个Map器,这些Map器只在json字符串输入中查找不同的属性时执行完全相同的操作。
摘自msdn博客以供说明。Map器将我们假定为json字符串的inputline转换为对象。拿出“某个属性”来Map。这里的问题是我们如何注入“some属性”,这样我们就可以控制Map器的行为,而不必创建Map器的多个实现。
public class MySimpleMapper : MapperBase
{
public override void Map(string inputLine, MapperContext context)
{
//interpret the incoming line as an integer value
SomeObject obj = JsonConvert.Serialize<SomeObject>(inputLine);
int value = obj.Properties["some property"];
//determine whether value is even or odd
string key = (value % 2 == 0) ? “even” : “odd”;
//output key assignment with value
context.EmitKeyValue(key, value.ToString());
}
}
从Map器获取Map对象的reducer类。
public class MySimpleReducer : ReducerCombinerBase
{
public override void Reduce(
string key, IEnumerable<string> values, ReducerCombinerContext context
)
{
//initialize counters
int myCount = 0;
int mySum = 0;
//count and sum incoming values
foreach (string value in values)
{
mySum += int.Parse(value);
myCount++;
}
//output results
context.EmitKeyValue(key, myCount + “t” + mySum);
}
请注意,我们如何为它指定Map器和还原器的类型,因此需要一个无参数构造函数。
//output results
context.EmitKeyValue(key, myCount + “t” + mySum);
}
//establish job configuration
HadoopJobConfiguration myConfig = new HadoopJobConfiguration();
myConfig.InputPath = “/demo/simple/in”;
myConfig.OutputFolder = “/demo/simple/out”;
//connect to cluster
Uri myUri = new Uri(“http://localhost”);
string userName = “hadoop”;
string passWord = null;
IHadoop myCluster = Hadoop.Connect(myUri, userName, passWord);
//execute mapreduce job
MapReduceResult jobResult =
myCluster.MapReduceJob.Execute<MySimpleMapper, MySimpleReducer>(myConfig);
1条答案
按热度按时间qni6mghb1#
可以使用上下文将变量传递到map/reduce类中。下面是一个通过powershell提交jar并读取javam/r程序中的密钥的示例
powershell提交:
从你的m/r工作
资源:http://www.andrewsmoll.com/3-hacks-for-hadoop-and-hdinsight-clusters/httpshttp://blogs.msdn.microsoft.com/mostlytrue/2014/04/10/merging-small-files-on-hdinsight/