所以现在我有一些东西看起来像这样:
1 3 1 4 2 3 2 5 2 6
输出:列表(列表(3,4),列表(3,5,6))
zz2j4svz1#
val df = Seq((1,3), (1,4), (2,3), (2,5), (2,6)).toDF("MemberID", "CourseID") df.show(false) val resDF = df.groupBy("MemberID").agg(collect_list('CourseID).alias("CourseID")) val result = resDF.select(concat_ws(",", 'CourseID)).collect.toList.map(_.toSeq.toList) // +--------+--------+ // |MemberID|CourseID| // +--------+--------+ // |1 |3 | // |1 |4 | // |2 |3 | // |2 |5 | // |2 |6 | // +--------+--------+ // // df: org.apache.spark.sql.DataFrame = [MemberID: int, CourseID: int] // resDF: org.apache.spark.sql.DataFrame = [MemberID: int, CourseID: array<int>] // result: List[List[Any]] = List(List(3,4), List(3,5,6))
1条答案
按热度按时间zz2j4svz1#