首先,我刚开始学习scala,所以如果我问的问题这么简单,我很抱歉。
我有一个评级数据,即val评级: RDD[Rating](userid, movieid, rating)
我想计算每个用户的平均值并创建一个 RDD (userid, average_rating)
.
之后,我将根据用户的平均评级过滤评级数据,比如如果用户的平均评级为2.0,那么o将只取评级大于等于其平均评级的行。
以下是我迄今为止的尝试:
val ratings= DATA.filter(row => row!= first_header).map{ fields =>
new Rating(
fields.split(",")(0).toInt,
fields.split(",")(1).toInt,
fields.split(",")(2).toDouble)
}
// Calculating User Average
val counts = ratings.map(item => (item.user,item.rating) )
val goodratingsum = counts.mapValues(value => (value, 1)) // map entry with a count of 1
.reduceByKey {
case ((sumL, countL), (sumR, countR)) =>
(sumL + sumR, countL + countR)
}
val goodratings = goodratingsum.mapValues {
case (sum , count) => sum / count
}
.collect
// Trying to create a new RDD which is filtered according to each user average of ratings.
val goodRatings = ratings.filter(r => r.user == avguserrat._1 && ((r.rating : Double) >= avguserrat._2))
错误:但是当我试图从减少的数据avguserrat中获得用户ID和平均评级时:
-value _1 is not a member of org.apache.spark.rdd.RDD[(Int, Double)]
-value _2 is not a member of org.apache.spark.rdd.RDD[(Int, Double)]
为什么我不能达到userid的值和他们的平均评分。
暂无答案!
目前还没有任何答案,快来回答吧!