给定一个sparkDataframe,它的列可能包含也可能不包含嵌套的json。这个嵌套的json是动态的。最终的要求是打破json并为嵌套json中的每个键生成一个新的dataframe,其中包含新的列。
json是动态的,所以生成的表是动态的。还请考虑Dataframe由超过1亿条记录组成。
例如-
输入
------------------------------------------------------------------------
|id |key |type |value
|f9f |BUSI |off |false
|f96 |NAME |50 |true
|f9z |BANK |off |{"Name":"United School","admNumber":"197108","details":{"code":"WEREFFW32","studentName":"Abhishek kumar","doc":"certificate","admId":"3424325328","stat":0,"studentDetails":false} }|
输出:-
--------------------------------------------------------------------------------------------------------------------------
|id |key |type |value |Name | admNumber |code | studentName | doc |admId |stat | studentDetails
+------------------------------------+-----------------+-------------+----------------------------------------------------
|f9f |BUSI |off |false |NULL |NULL |NULL |NULL |NULL |NULL |NULL |NULL |
|f96 |NAME |50 |true |NULL |NULL |NULL |NULL |NULL |NULL |NULL |NULL |
|f9z |BANK |off |NULL |United School |197108 |WEREFFW32 |Abhishek kumar |certificate |3424325328 |0 |false |
1条答案
按热度按时间pbwdgjma1#