我正试图用hadoop从一个庞大的数据集中找到十大电影。我正在使用map-reduce方法。我使用了一个本地集合,即treemap来对数据进行排序,但不建议使用这种方法。在mapper中处理大量数据时,我可以知道正确的方法来对数据进行排序吗?我给我的Map器和减速机代码
Map程序代码
public class HighestViewedMoviesMapper extends Mapper<Object, Text, NullWritable, Text> {
private TreeMap<Integer, Text> highestView = new TreeMap<Integer, Text>();
@Override
public void map( Object key, Text values, Context context ) throws IOException, InterruptedException {
String data = values.toString();
String[] field = data.split( "::", -1 );
if ( null != field && field.length == 2 ) {
int views = Integer.parseInt( field[1] );
highestView.put( views, new Text( field[0] + "::" + field[1] ) );
if ( highestView.size() > 10 ) {
highestView.remove( highestView.firstKey() );
}
}
}
@Override
protected void cleanup( Context context ) throws IOException, InterruptedException {
for ( Map.Entry<Integer, Text> entry : highestView.entrySet() ) {
context.write( NullWritable.get(), entry.getValue() );
}
}
}
减速机代码
public class HighestViewMoviesReducer extends Reducer<NullWritable, Text, NullWritable, Text> {
private TreeMap<Integer, Text> highestView = new TreeMap<Integer, Text>();
public void reduce( NullWritable key, Iterable<Text> values, Context context )
throws IOException, InterruptedException {
for ( Text value : values ) {
String data = value.toString();
String[] field = data.split( "::", -1 );
if ( field.length == 2 ) {
highestView.put( Integer.parseInt( field[1] ), new Text( value ) );
if ( highestView.size() > 10 ) {
highestView.remove( highestView.firstKey() );
}
}
}
for ( Text t : highestView.descendingMap().values() ) {
context.write( NullWritable.get(), t );
}
}
}
谁能告诉我最好的方法吗?提前谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!