根据rdd上的userid计算用户评级和Map的平均值

a1o7rhls  于 2021-05-26  发布在  Spark
关注(0)|答案(0)|浏览(176)

首先,我刚开始学习scala,所以如果我问的问题这么简单,我很抱歉。
我有一个评级数据,即val评级: RDD[Rating](userid, movieid, rating) 我想计算每个用户的平均值并创建一个 RDD (userid, average_rating) .
之后,我将根据用户的平均评级过滤评级数据,比如如果用户的平均评级为2.0,那么o将只取评级大于等于其平均评级的行。
以下是我迄今为止的尝试:

val ratings= DATA.filter(row => row!= first_header).map{ fields =>                              
                                new Rating(
                                        fields.split(",")(0).toInt,
                                        fields.split(",")(1).toInt,
                                        fields.split(",")(2).toDouble)         
                                }

// Calculating User Average
    val counts = ratings.map(item => (item.user,item.rating) )
    val goodratingsum = counts.mapValues(value => (value, 1)) // map entry with a count of 1
                                    .reduceByKey {
                                    case ((sumL, countL), (sumR, countR)) => 
                                    (sumL + sumR, countL + countR)
                            }
                            val goodratings = goodratingsum.mapValues { 
                            case (sum , count) => sum / count 
                            }
                            .collect

 // Trying to create a new RDD which is filtered according to each user average of ratings.

    val goodRatings = ratings.filter(r => r.user == avguserrat._1 && ((r.rating : Double) >= avguserrat._2))

错误:但是当我试图从减少的数据avguserrat中获得用户ID和平均评级时:

-value _1 is not a member of org.apache.spark.rdd.RDD[(Int, Double)]
-value _2 is not a member of org.apache.spark.rdd.RDD[(Int, Double)]

为什么我不能达到userid的值和他们的平均评分。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题