如何在group by之后将值聚合到Map列表?

rhfm7lfc  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(516)

我有一张像这样的table

id  | fruit  | buy_time
------------------------
1   | apple  | 100
1   | banana | 105        
2   | grapes | 102
2   | orange | 101
2   | apple | 110

我的预期输出(按id分组的Map列表)

id  | buy_info
------------------------
1   | [{"fruit": "apple", "time": 100}, {"fruit": "banana", "time": 105}]
2   | [{"fruit": "orange", "time": 101}, {"fruit": "grapes", "time": 102}, {"fruit": "apple", "time": 110}]
jmo0nnb3

jmo0nnb31#

使用 .groupByto_json (Spark-2.4+) + collect_list +struct 功能。 Example: ```
import org.apache.spark.sql.functions._
val df=Seq((1,"apple",100),(1,"banana",105),(2,"grapes",102),(2,"orange",101),(2,"apple",101)).toDF("id","fruit","buy_time")

df.groupBy("id").agg(to_json(collect_list(struct(col("fruit"),col("buy_time").alias("time")))).alias("buy_info")).show(10,false)
//+---+------------------------------------------------------------------------------------------+
//|id |buy_info |
//+---+------------------------------------------------------------------------------------------+
//|1 |[{"fruit":"apple","time":100},{"fruit":"banana","time":105}] |
//|2 |[{"fruit":"grapes","time":102},{"fruit":"orange","time":101},{"fruit":"apple","time":101}]|
//+---+------------------------------------------------------------------------------------------+

相关问题