我使用的输入文本文件的内容是
1 "Come
1 "Defects,"
1 "I
1 "Information
1 "J"
2 "Plain
5 "Project
1 "Right
1 "Viator"
左边的数字和右边的单词用tab分隔,但是当我执行下面的mapper函数时
public static class SortingMapper extends Mapper<Text, Text, Pair, NullWritable>
{
private Text word = new Text();
private IntWritable freq = new IntWritable();
@Override
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
String[] words = line.split("\t");
freq = new IntWritable(Integer.parseInt(words[0]));
word.set(words[1]);
context.write(new Pair(word, freq), NullWritable.get());}}
public static class FirstPartitioner extends Partitioner<Pair, NullWritable>
{
@Override
public int getPartition(Pair p, NullWritable n, int numPartitions)
{
String word = p.getFirst().toString();
char first = word.charAt(0);
char middle = 'n';
if(middle < first)
{
return 0;
}
else
return 1 % numPartitions; //why does % need???
}
}
public static class KeyComparator extends WritableComparator
{
protected KeyComparator()
{
super(Pair.class, true);
}
@Override
public int compare(WritableComparable w1, WritableComparable w2)
{
Pair v1 = (Pair) w1;
Pair v2 = (Pair) w2;
/*
* since we already count word in the first MR we only need to sort the list by frequency
* so no need to compare Text again
int cmp = Pair.compare(v1.getFirst(), v2.getFirst());
if(cmp != 0) { return cmp; }
*/
return -1 * v1.compareTo(v2);
//possible error: it compares Text first and then compare IntWritable
}
}
public static class GroupComparator extends WritableComparator
{
protected GroupComparator()
{
super(Pair.class, true);
}
@Override
public int compare(WritableComparable w1, WritableComparable w2)
{
Pair v1 = (Pair) w1;
Pair v2 = (Pair) w2;
return v1.getFirst().compareTo(v2.getFirst());
//this compareTo is under binarycomparable
}
}
public static class SortingReducer extends Reducer<Pair, NullWritable, Pair, NullWritable>
{
@Override
public void reduce(Pair p, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException
{
System.out.println("sortingReducer");
context.write(p, NullWritable.get());
}
}
public static void main(String[] args) throws Exception
{
Configuration conf2 = new Configuration();
//String[] otherArgs2 = new GenericOptionsParser(conf1, args).getRemainingArgs();
ControlledJob cJob2 = new ControlledJob(conf2);
//conf2.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", " ");
cJob2.setJobName("Sorting");
Job job2 = cJob2.getJob();
job2.setJarByClass(Sorting.class);
job2.setInputFormatClass(KeyValueTextInputFormat.class);
job2.setMapperClass(SortingMapper.class);
job2.setPartitionerClass(FirstPartitioner.class);
job2.setSortComparatorClass(KeyComparator.class);
job2.setGroupingComparatorClass(GroupComparator.class);
job2.setReducerClass(SortingReducer.class);
job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(NullWritable.class);
job2.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job2, new Path("hdfs:///tmp/inter/part-r-
00000.txt"));
FileOutputFormat.setOutputPath(job2, new Path(args[0]));
job2.waitForCompletion(true);
}
下面是一些错误
Error: java.lang.NumberFormatException: For input string: ""Come"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:481)
at java.lang.Integer.parseInt(Integer.java:527)
at Sorting$SortingMapper.map(Sorting.java:98)
at Sorting$SortingMapper.map(Sorting.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
我猜string[]单词有问题,但我不知道该怎么解决。如果你能帮我改正错误,我将不胜感激。
另外
我发现我曾经
job2.setInputFormatClass(KeyValueTextInputFormat.class);
在main函数中,它通过制表符分隔键和值,所以我改变了
String line = value.toString();
String[] words = line.split("\t");
freq = new IntWritable(Integer.parseInt(words[0]));
word.set(words[1]);
进入之内
String num = key.toString();
freq = new IntWritable(Integer.parseInt(num));
word = value;
context.write(new Pair(word, freq), NullWritable.get());
它运行成功,但输出很奇怪。
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
....
我的预期产出是
5 "Project
2 "Plain
1 "Come
1 "Defects,"
1 "I
1 "Information
1 "J"
1 "Right
1 "Viator"
变化让情况变得更糟了吗?
1条答案
按热度按时间0yg35tkg1#
你只需要重写
toString
在你的Pair
对象并返回任何您想要作为每个记录的最终输出的内容。像这样的。。。