java—查找我们所有商店中按产品类别划分的销售明细

h43kikqp  于 2021-06-01  发布在  Hadoop
关注(0)|答案(1)|浏览(215)

我有一个销售档案,里面有店名、位置、销售价格、产品名称等信息。档案的格式如下:,

2012-01-01  09:00   San Jose    Men's Clothing  214.05  Amex
2012-01-01  09:00   Fort Worth  Women's Clothing    153.57  Visa
2012-01-01  09:00   San Diego   Music   66.08   Cash
2012-01-01  09:00   Pittsburgh  Pet Supplies    493.51  Discover
2012-01-01  09:00   Omaha   Children's Clothing 235.63  MasterCard
2012-01-01  09:00   Stockton    Men's Clothing  247.18  MasterCard

我想写一个Map减少工作,以找到销售细分产品类别在我们所有的商店。我的代码(包括mapper和reducer)如下所示,

public final class P1Q1 {

    public static final class P1Q1Map extends Mapper<LongWritable, Text, Text, DoubleWritable> {

        private final Text word = new Text();

        public final void map(final LongWritable key, final Text value, final Context context)
                throws IOException, InterruptedException {

            final String line = value.toString();
            final String[] data = line.trim().split("\t");

            if (data.length == 6) {

                final String product = data[3];
                final double sales = Double.parseDouble(data[4]);

                word.set(product);
                context.write(word, new DoubleWritable(sales));
            }
        }
    }

    public static final class P1Q1Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {

        public final void reduce(final Text key, final Iterable<DoubleWritable> values, final Context context)
                throws IOException, InterruptedException {

            double sum = 0.0;

            for (final DoubleWritable val : values) {
                sum += val.get();
            }

            context.write(key, new DoubleWritable(sum));
        }
    }

    public final static void main(final String[] args) throws Exception {

        final Configuration conf = new Configuration();

        final Job job = new Job(conf, "P1Q1");
        job.setJarByClass(P1Q1.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);

        job.setMapperClass(P1Q1Map.class);
        job.setCombinerClass(P1Q1Reduce.class);
        job.setReducerClass(P1Q1Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
    }
}

代码提供的答案不正确,与udacity结果不匹配。
有人知道这是不是正确的想法,怎么做吗?
笔记
我在输出文件中得到了一个完全错误的结果,

Baby    5.749180844000035E7
Books   5.745075790999787E7
CDs 5.741075304000156E7
Cameras 5.7299046639999785E7
Children's Clothing 5.762482094000117E7
Computers   5.7315406319999576E7
Consumer Electronics    5.745237412999948E7
Crafts  5.7418154499999225E7
DVDs    5.764921213999939E7
Garden  5.7539833110000335E7
Health and Beauty   5.748158956000019E7
Men's Clothing  5.76212790400011E7
Music   5.749548970000038E7
Pet Supplies    5.71972502400004E7
Sporting Goods  5.7599085889999546E7
Toys    5.746347710999843E7
Video Games 5.7513165580000155E7
Women's Clothing    5.74344489699993E7

我想如果把这个评论去掉组合器,就可以了。我这么做了,结果没有改变。

job.setCombinerClass(P1Q1Reduce.class);

我提供了密码和密码 purchases.txt 文件链接在这里。如果有人试图解决问题并成功提交,请告诉我。

wkyowqbh

wkyowqbh1#

在大多数情况下,我会说您的代码看起来很好,组合器只是一个优化,所以排除它应该产生与包含它相同的输出。
我写了自己的mr,得到了给定输入的输出

Children's Clothing 235.63
Men's Clothing  461.23
Music   66.08
Pet Supplies    493.51
Women's Clothing    153.57

显然,如果你有成百上千的商店,那么你会得到数百万的货币单位,如你的输出所示。
代码

@Override
public int run(String[] args) throws Exception {
    Configuration conf = getConf();
    Job job = Job.getInstance(conf, APP_NAME);
    job.setJarByClass(StoreSumRunner.class);

    job.setMapperClass(TokenizerMapper.class);
    job.setReducerClass(CurrencyReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(DoubleWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    return job.waitForCompletion(true) ? 0 : 1;
}

static class TokenizerMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {

    private final Text key = new Text();
    private final DoubleWritable sales = new DoubleWritable();

    @Override
    protected void map(LongWritable offset, Text value, Context context) throws IOException, InterruptedException {
        final String line = value.toString();
        final String[] data = line.trim().split("\\s\\s+");

        if (data.length < 6) {
            System.err.printf("mapper: not enough records for %s%n", Arrays.toString(data));
            return;
        }

        key.set(data[3]);

        try {
            sales.set(Double.parseDouble(data[4]));
            context.write(key, sales);
        } catch (NumberFormatException ex) {
            System.err.printf("mapper: invalid value format %s%n", data[4]);
        }
    }
}

static class CurrencyReducer extends Reducer<Text, DoubleWritable, Text, Text> {
    private final Text output = new Text();
    private final DecimalFormat df = new DecimalFormat("#.00");

    @Override
    protected void reduce(Text date, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
        double sum = 0;
        for (DoubleWritable value : values) {
            sum += value.get();
        }
        output.set(df.format(sum));
        context.write(date, output);
    }
}

相关问题