在将Dataframe转换为json时从Dataframe中删除空数组字段

bakd9h0s  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(422)

是否有任何方法可以通过不使用空字段从sparkDataframe创建json:
假设我有一个Dataframe:

+-------+----------------+

|   name|       hit_songs|

+-------+----------------+

|beatles|[help, hey jude]|

|  romeo|      [eres mia]|

| juliet|      null      |

+-------+----------------+

我想把它转换成json,比如:

[{
name: "beatles",
hit_songs: [help, hey jude]
},
{
name: "romeo",
hit_songs: [eres mia]
},
{
name: "juliet"
}
]

如果json\u对象的值为null,我不希望该字段在json\u对象中出现

6za6bjd0

6za6bjd01#

使用 to_json 此案例的函数。

df=spark.createDataFrame([("beatles",["help","hey juude"]),("romeo",["eres mia"]),("juliet",None)],["name","hit_songs"])

from pyspark.sql.functions import *

df.groupBy(lit(1)).\
agg(collect_list(to_json(struct('name','hit_songs'))).alias("json")).\
drop("1").\
show(10,False)

# +-------------------------------------------------------------------------------------------------------------------+

# |json                                                                                                               |

# +-------------------------------------------------------------------------------------------------------------------+

# |[{"name":"beatles","hit_songs":["help","hey juude"]}, {"name":"romeo","hit_songs":["eres mia"]}, {"name":"juliet"}]|

# +-------------------------------------------------------------------------------------------------------------------+

# using toJSON function.

df.groupBy(lit(1)).\
agg(collect_list(struct('name','hit_songs')).alias("json")).\
drop("1").\
toJSON().\
collect()

# [u'{"json":[{"name":"beatles","hit_songs":["help","hey juude"]},{"name":"romeo","hit_songs":["eres mia"]},{"name":"juliet"}]}']

相关问题