为什么aggregatebykey不返回正确的结果?

92dk7w1h  于 2021-05-19  发布在  Spark
关注(0)|答案(0)|浏览(283)

我有下面这双rdd

val pairRDD = prFilterRDD.map(x => (x(1), x(4).toFloat))

 x(1) = categoryid
 x(4).toFloat = price  

Result:

    (2,124.99)
    (17,129.99)
    (2,129.99)
    (17,199.99)
    (17,299.99)
    (17,299.99)
    (2,139.99)
    (17,149.99)
    (17,119.99)
    (17,399.99)
    (3,189.99)
    (17,119.99)
    (3,159.99)
    (18,129.99)
    (18,189.99)
    (3,199.99)
    (18,134.99)
    (18,129.99)
    (18,129.99)
    (18,139.99)
    (3,149.99)
    (18,129.99)
    (3,159.99)
    (18,124.99)
    (4,299.98)
    (18,129.99)

我想按类别计算价格的总和。我编写了以下代码:

val initialVal = 0.0f

    val comb =(initialVal: Float, strVal:Float) => initialVal+ strVal

    val mergeValSum= (v1:Float, v2:Float) => v1+v2

    val output = pairRDD.aggregateByKey(initialVal)(comb, mergeValSum)

结果如下:

(4,5689.7803)
 (8,1184.95)
 (19,1799.87)
 (48,6599.831)
 (51,1499.93)
 (22,3114.95)
 (33,587.97003)
 (44,1744.8999)
 (11,5619.8115)
 (49,2789.89)
 (5,2314.89)

我没有预期的结果。例如,对于category id=8,预期结果=792.0,我有1184.95。我是否正确使用aggregatebykey?
谢谢你的回答。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题