map-reduce程序,但它总是溢出

ss2ws0br  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(196)

我已经写了一个mapreduce程序。起初它运行良好,但过了一会儿,我改变了一些东西,然后突然我的电脑说我的电脑没有内存。然后我意识到我做的工作占用了很多内存,我不知道为什么。当我删除溢出文件后,我发现我的程序不能正常运行,它总是溢出,我不记得我修改了什么代码。以下是我的Map器、还原器、驱动程序和控制台消息:
Map器:

package SalesProduct;

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class SalesCategoryMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {

private DoubleWritable one = new DoubleWritable(1);
public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
    // TODO Auto-generated method stub
    String valueString = value.toString();
    StringTokenizer tokenizerArticle = new StringTokenizer(valueString,"\n");
    System.out.println("Here: In map \n");
    while (tokenizerArticle.hasMoreTokens()){
        //StringTokenizer tokenizer = new StringTokenizer(tokenizerArticle.nextToken());

        String[] items = valueString.split("\t");

        String itemName = items[3];
        double itemPrice = Double.parseDouble(items[4]);

        context.write(new Text(itemName), new DoubleWritable(itemPrice));
        //context.write(new Text(itemName), one);

    }

}
}

减速器:

package SalesProduct;

import java.io.IOException;

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;

public class SalesItemCategoryReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
private DoubleWritable result = new DoubleWritable();
public void reduce(Text t_key, Iterable<DoubleWritable> values,  Context context)
        throws IOException, InterruptedException {

    Text key = t_key;

    double sum = 0;

    for(DoubleWritable val : values){
        sum = sum + val.get();
    }

    /*
    while(values.hasNext()){
        DoubleWritable tmp = values.next();
        sum = sum + tmp.get();
    }*/
    //result.set(sum);
    context.write(key, result);

}
}

司机:

package SalesResult;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class SalesItemDriver {
public static void main(String[] args) throws ClassNotFoundException, IOException, InterruptedException{
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf,"SalesItemDriver");
    job.setJarByClass(SalesItemDriver.class);

    // get category
    job.setMapperClass(SalesProduct.SalesCategoryMapper.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(DoubleWritable.class);

    job.setCombinerClass(SalesProduct.SalesItemCategoryReducer.class);
    job.setReducerClass(SalesProduct.SalesItemCategoryReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(DoubleWritable.class);

    //设置分片大小
    job.setInputFormatClass(CombineTextInputFormat.class);
    CombineTextInputFormat.setMaxInputSplitSize(job, 4194304);
    CombineTextInputFormat.setMinInputSplitSize(job, 2097152);

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    //FileOutputFormat.setOutputPath(job, new Path(args[1]));

    //Path path = new Path(args[1]);
    Path path = new Path(args[1]);
    FileSystem fs = FileSystem.get(conf);

    if(fs.exists(path)) {
        fs.delete(path, true);
    }

    FileOutputFormat.setOutputPath(job, path);

    job.waitForCompletion(true);

}

}

2017-04-21 22:04:50,780 WARN [org.apache.hadoop.util.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-04-21 22:04:51,843 INFO [org.apache.hadoop.conf.Configuration.deprecation] - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-04-21 22:04:51,844 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-04-21 22:04:52,132 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2017-04-21 22:04:52,138 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-04-21 22:04:52,148 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input files to process : 1
2017-04-21 22:04:52,256 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:2
2017-04-21 22:04:52,412 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local1001883244_0001
2017-04-21 22:04:52,646 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/
2017-04-21 22:04:52,647 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local1001883244_0001
2017-04-21 22:04:52,648 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null
2017-04-21 22:04:52,653 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2017-04-21 22:04:52,653 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2017-04-21 22:04:52,654 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2017-04-21 22:04:52,717 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks
2017-04-21 22:04:52,718 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local1001883244_0001_m_000000_0
2017-04-21 22:04:52,742 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2017-04-21 22:04:52,742 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2017-04-21 22:04:52,754 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2017-04-21 22:04:52,754 INFO [org.apache.hadoop.mapred.Task] -  Using ResourceCalculatorProcessTree : null
2017-04-21 22:04:52,760 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: hdfs://localhost:9000/input/purchases.txt:0+134217728
2017-04-21 22:04:52,837 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584)
2017-04-21 22:04:52,837 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100
2017-04-21 22:04:52,837 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080
2017-04-21 22:04:52,837 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600
2017-04-21 22:04:52,837 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600
2017-04-21 22:04:52,841 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-04-21 22:04:53,652 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local1001883244_0001 running in uber mode : false
2017-04-21 22:04:53,654 INFO [org.apache.hadoop.mapreduce.Job] -  map 0% reduce 0%
2017-04-21 22:04:54,718 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2017-04-21 22:04:54,718 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 49471275; bufvoid = 104857600
2017-04-21 22:04:54,718 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 17610700(70442800); length = 8603697/6553600
2017-04-21 22:04:54,718 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 58074971 kvi 14518736(58074944)
2017-04-21 22:04:55,730 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 0
2017-04-21 22:04:55,738 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 58074971 kv 14518736(58074944) kvi 12367824(49471296)
2017-04-21 22:04:56,831 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2017-04-21 22:04:56,831 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 58074971; bufend = 2688654; bufvoid = 104857592
2017-04-21 22:04:56,831 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 14518736(58074944); kvend = 5915040(23660160); length = 8603697/6553600
2017-04-21 22:04:56,831 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 11292334 kvi 2823076(11292304)
2017-04-21 22:04:57,661 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 1
2017-04-21 22:04:57,670 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 11292334 kv 2823076(11292304) kvi 672168(2688672)
2017-04-21 22:04:58,665 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2017-04-21 22:04:58,665 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 11292334; bufend = 60763609; bufvoid = 104857600
2017-04-21 22:04:58,665 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 2823076(11292304); kvend = 20433780(81735120); length = 8603697/6553600
2017-04-21 22:04:58,665 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 69367289 kvi 17341816(69367264)
2017-04-21 22:04:59,369 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 2
2017-04-21 22:04:59,377 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 69367289 kv 17341816(69367264) kvi 15190908(60763632)
2017-04-21 22:05:00,401 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2017-04-21 22:05:00,401 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 69367289; bufend = 13980964; bufvoid = 104857600
2017-04-21 22:05:00,401 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 17341816(69367264); kvend = 8738120(34952480); length = 8603697/6553600
2017-04-21 22:05:00,401 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 22584644 kvi 5646156(22584624)
2017-04-21 22:05:01,083 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 3
2017-04-21 22:05:01,092 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 22584644 kv 5646156(22584624) kvi 3495248(13980992)
2017-04-21 22:05:02,071 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2017-04-21 22:05:02,071 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 22584644; bufend = 72055919; bufvoid = 104857600
2017-04-21 22:05:02,071 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 5646156(22584624); kvend = 23256860(93027440); length = 8603697/6553600
2017-04-21 22:05:02,071 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 80659599 kvi 20164892(80659568)
2017-04-21 22:05:02,769 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 4
2017-04-21 22:05:02,777 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 80659599 kv 20164892(80659568) kvi 18013984(72055936)
2017-04-21 22:05:03,792 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2017-04-21 22:05:03,792 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 80659599; bufend = 25273274; bufvoid = 104857600
2017-04-21 22:05:03,792 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 20164892(80659568); kvend = 11561196(46244784); length = 8603697/6553600
2017-04-21 22:05:03,792 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 33876954 kvi 8469232(33876928)
2017-04-21 22:05:04,491 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 5
2017-04-21 22:05:04,499 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 33876954 kv 8469232(33876928) kvi 6318324(25273296)
2017-04-21 22:05:04,755 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map > map
2017-04-21 22:05:05,507 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output

它会一直溢出直到我关掉它。为什么?我很困惑。。。我在我的电脑里运行这个程序,而不是在云端。我的电脑内存只有5gb,这有关系吗?它一开始是可运行的,这意味着它可以输出一个-00000部分的文件。虽然内容不符合我的期望。。。现在它将输出许多这样的文件

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题