对spark scakla中的多个列使用groupby和agg

eqfvzcg8  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(657)

我有一个4列的Dataframe。我想申请 GroupBy 在2列的基础上,希望收集其他列作为列表。例如:-我有一个这样的df

+---+-------+--------+-----------+
|id |fName  |lName   |dob        |
+---+-------+--------+-----------+
|1  |Akash  |Sethi   |23-05-1995 |
|2  |Kunal  |Kapoor  |14-10-1992 |
|3  |Rishabh|Verma   |11-08-1994 |
|2  |Sonu   |Mehrotra|14-10-1992 |
+---+-------+--------+-----------+

我想要我的输出像this:-

+---+-----------+-------+--------+--------------------+
|id |dob        |fname           |lName               |
+---+-----------+-------+--------+--------------------+
|1  |23-05-1995 |[Akash]         |[Sethi]             |
|2  |14-10-1992 |[Kunal, Sonu]   |[Kapoor, Mehrotra]  |
|3  |11-08-1994 |[Rishabh]       |[Verma]             |
+---+-----------+-------+--------+--------------------+
9gm1akwq

9gm1akwq1#

你可以用agg做类似的事情

df.groupBy("id","dob").agg(collect_list(col("fname")),collect_list(col("lName")))

相关问题