这个问题在这里已经有答案了:
spark sql替换mysql的组concat聚合函数(8个答案)
两年前关门了。
我需要从源表创建一个表(hive table/spark dataframe),该表将多行用户的数据存储到单行列表中。
User table:
Schema: userid: string | transactiondate:string | charges: string |events:array<struct<name:string,value:string>>
----|------------|-------| ---------------------------------------
123 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"this"}]
123 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"last"}]
123 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"recent"}]
123 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"0"}]
456 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"this"}]
456 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"last"}]
456 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"recent"}]
456 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"0"}]
输出表应为
userid:String | concatenatedlist :List[Row]
-------|-----------------
123 | [[2017-09-01,20.00,[{"name":"chargeperiod","value":"this"}]],[2017-09-01,30.00,[{"name":"chargeperiod","value":"last"}]],[2017-09-01,20.00,[{"name":"chargeperiod","value":"recent"}]], [2017-09-01,30.00, [{"name":"chargeperiod","value":"0"}]]]
456 | [[2017-09-01,20.00,[{"name":"chargeperiod","value":"this"}]],[2017-09-01,30.00,[{"name":"chargeperiod","value":"last"}]],[2017-09-01,20.00,[{"name":"chargeperiod","value":"recent"}]], [2017-09-01,30.00, [{"name":"chargeperiod","value":"0"}]]]
spark版本:1.6.2
1条答案
按热度按时间tktrz96b1#