我有一张像这样的table
id | fruit | buy_time
------------------------
1 | apple | 100
1 | banana | 105
2 | grapes | 102
2 | orange | 101
2 | apple | 110
我的预期输出(按id分组的Map列表)
id | buy_info
------------------------
1 | [{"fruit": "apple", "time": 100}, {"fruit": "banana", "time": 105}]
2 | [{"fruit": "orange", "time": 101}, {"fruit": "grapes", "time": 102}, {"fruit": "apple", "time": 110}]
1条答案
按热度按时间jmo0nnb31#
使用
.groupBy
与to_json (Spark-2.4+) + collect_list +struct
功能。Example:
```import org.apache.spark.sql.functions._
val df=Seq((1,"apple",100),(1,"banana",105),(2,"grapes",102),(2,"orange",101),(2,"apple",101)).toDF("id","fruit","buy_time")
df.groupBy("id").agg(to_json(collect_list(struct(col("fruit"),col("buy_time").alias("time")))).alias("buy_info")).show(10,false)
//+---+------------------------------------------------------------------------------------------+
//|id |buy_info |
//+---+------------------------------------------------------------------------------------------+
//|1 |[{"fruit":"apple","time":100},{"fruit":"banana","time":105}] |
//|2 |[{"fruit":"grapes","time":102},{"fruit":"orange","time":101},{"fruit":"apple","time":101}]|
//+---+------------------------------------------------------------------------------------------+