10m数据集，但只得到13个预测

ttvkxqim 于 2021-06-24 发布在 Flink

关注(0)|答案(1)|浏览(244)

将apache flink als与以下代码一起使用时：

val env=ExecutionEnvironment.getExecutionEnvironment
val inputDS: DataSet[String]=env.readTextFile("/home/master/dataset/ml-10m/trainset")
val inputDS1: DataSet[Tuple3[Int,Int,Double]] = inputDS.map{
      t =>
      val split = t.split("::")
      Tuple3(split(0).toInt, split(1).toInt, split(2).toDouble)                               
          }
val als = ALS()
         .setIterations(5)
         .setNumFactors(10)
         .setBlocks(300)
         // Set the other parameters via a parameter map
         val parameters = ParameterMap()                                                                        
        .add(ALS.Lambda, 0.2)                                                                                 
        .add(ALS.Seed, 42L)                                                                               
        // Calculate the factorization                                                                                          
        als.fit(inputDS1, parameters)
  val inputttestDS: DataSet[String] = env.readTextFile("/home/master/dataset/ml-10m/testset")                                                                                                       
  val testingDS: DataSet[Tuple2[Int,Int]] = inputttestDS.map{                                                                                                      
          t =>                                                                                                                                                                                                                                 
          val split = t.split("::")                                                                                                                            
          Tuple2(split(0).toInt, split(1).toInt)
}    
val predictedRatings=als.predict(testingDS)
predictedRatings.print()
predictedRatings.writeAsText("path to result")
env.execute()

但结果只能预测结果文件中的最后13个数据。对于apache flink在idea中使用的数据是否太大（对于train数据集，它有8000000个观测值）。此外，测试数据集有20000000个观察值？Dataframe是“userid：：itemid：：rating：：timestamp”。另外，我的电脑内存是8g。或者我的代码有错误？请告诉我，谢谢。

apache-flink

来源：https://stackoverflow.com/questions/49859928/running-ml-10m-dataset-using-apache-flink-als-but-get-only-13-predictions