数据集详细信息:
行数:296211715
唯一用户数:6988040
群集详细信息:
尺寸:m5.8XL
硕士:1
核心:8
代码:
# creating numeric ids instead of the existing string ids
indexer_user = StringIndexer(inputCol="userId", outputCol="userId_num")
indexer_movie = StringIndexer(inputCol="movieId", outputCol="movieId_num")
aekt_collab_rename_indexed = indexer_user.fit(aekt_collab_rename).transform(aekt_collab_rename)
aekt_collab_rename_indexed = indexer_movie.fit(aekt_collab_rename_indexed).transform(aekt_collab_rename_indexed)
(training, test) = aekt_collab_rename_indexed.randomSplit([0.8,0.2])
als = ALS(maxIter=5, regParam=0.01, userCol="userId_num", itemCol="movieId_num", ratingCol="rating",
coldStartStrategy="drop")
model = als.fit(training)
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))
# Generate top 10 movie recommendations for each user
userRecs = model.recommendForAllUsers(10)
userRecs.show(10)
现在,所有代码都按预期运行,直到我尝试显示userrecs数据。错误如下:
py4j.protocol.Py4JJavaError: An error occurred while calling o277.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in stage 75.0 failed 4 times, most recent failure: Lost task 13.3 in stage 75.0 (TID 4641, ip-172-31-10-178.us-west-2.compute.internal, executor 180): ExecutorLostFailure (executor 180 exited caused by one of the running tasks) Reason: Container from a bad node: container_1604516516252_0004_01_000235 on host: ip-172-31-10-178.us-west-2.compute.internal. Exit status: 137. Diagnostics: [2020-11-04 21:04:08.061]Container killed on request. Exit code is 137
[2020-11-04 21:04:08.061]Container exited with a non-zero exit code 137.
[2020-11-04 21:04:08.061]Killed by external signal
这是我的集群设置的问题吗?任何帮助都将不胜感激。
暂无答案!
目前还没有任何答案,快来回答吧!