在hadoop中获取百分比

owfi6suc  于 2021-06-01  发布在  Hadoop
关注(0)|答案(1)|浏览(491)

我有一个项目,我需要采取一个逗号分隔的文件,许多列和拉出公司名称,结果的客户互动和多少次发生。
然后我需要计算我使用hadoop和java时,不良交互与良好交互的百分比。
我有一个工作Map和减少给我的公司名称和多少好的和坏的互动计数。
我的问题是,我找不到一种方法让hadoop把好坏分开来给我一个百分比。
大多数公司没有任何不良互动。
这是我的Map

public class TermProjectMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable( 1); 
    private Text word = new Text();

      @Override
      public void map(LongWritable key, Text value, Context context)
          throws IOException, InterruptedException {

            String[] columb = value.toString().split(",");
            String companyName = columb[5];
            String companyResponseToConsumer = columb[12];
            String lookfor = "closed without relief";

                if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {companyResponseToConsumer="Bad";}
                else {companyResponseToConsumer="Good";}
                //System.out.println(companyResponseToConsumer);
                if (companyName != "" && companyResponseToConsumer != "")
                {
                    word.set (companyName + " " + companyResponseToConsumer);
                    context.write( word, one); 
                }
      }
      }

这是我的名片

public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable > 
{ 
    private IntWritable result = new IntWritable(); 
      @Override
        public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException 
        { 
            int sum = 0; 
            for (IntWritable val : values) 
            { 
                sum += val.get(); 
            } 
            if (sum > 0) 
            {
                result.set( sum); 
                context.write( key, result);
            }
        } 
    }

这是我现在得到的一个例子。

AMERICAN EAGLE MORTGAGE COMPANY,Good,   4
AMERICAN EQUITY MORTGAGE,Good,  26 
AMERICAN EXPRESS COMPANY,Bad,   250 
AMERICAN EXPRESS COMPANY,Good,  9094 
AMERICAN FEDERAL MORTGAGE CORPORATION,Bad,  1 
AMERICAN FEDERAL MORTGAGE CORPORATION,Good, 3 
AMERICAN FINANCE HOUSE LARIBA,Good, 3 
AMERICAN FINANCIAL MORTGAGE COMPANY,Good,   3
3ks5zfa0

3ks5zfa01#

为了聚合这些公司,您需要将它们作为键输出,以便它们在减速器处合并。换句话说,您希望将好值和坏值放在同一个键上,而不是像现在这样分开。
我最初以为你能做到 [1, 0] 或者 [0, 1] ,但只输出 1 或者 -1 而不是 ("GOOD", 1) 以及 ("BAD", 1) 更容易处理(以及更高效的hadoop数据传输)
比如说,

private final static IntWritable ONE = new IntWritable(1); 
private final static IntWritable NEG_ONE = new IntWritable(-1); 

...

    IntWritable status;
    if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {status=NEG_ONE;}
    else {status=ONE;}

    if (!companyName.isEmpty())
    {
        word.set (companyName);
        context.write(companyName, status); 
    }

现在,在减速机中,计算值并计算百分比。

public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable > 
{ 
  private IntWritable result = new IntWritable(); 

  @Override
    public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException 
    { 
        int total = 0; 
        int good_sum = 0;
        for (IntWritable val : values) 
        { 
            good_sum += (val.get() == 1 ? 1 : 0); 
            total += 1
        } 
        if (total > 0) // Prevent division by zero
        {
            double percent = 1.0*good_sum/total;
            // Round it to how every many decimal places, if you want
            result.set(String.valueOf(percent)); // convert the floating number to a string
        } else {
            result.set("0.00"); 
        }
        context.write(key, result); 
    } 
}

我只计算了好的值,因为在下游处理中 (1 - good) = bad 你自己。
另外,我建议使用 DoubleWritable 作为还原值而不是 Text

相关问题