如何在Pyspark中使用“explode”从数组中提取选定的元素

sg3maiej 于 2023-10-15 发布在 Spark

关注(0)|答案(1)|浏览(90)

我有一个Spark框架spdf，数据如下所示：

player_name    team_history

John           [{Rangers, Center, Active}, {Blackhawks, Center, Former}, {Kings, Center, Former}],
Bob            [{Devils, Defense, Active}, {Maple Leafs, Defense, Former}, {Canadiens, Defense, Former}]

模式是：

hockey_schema = StructType([

     StructField("player_name", StringType(), True),

     StructField("team_history", ArrayType(
         StructType([
             StructField("team",      StringType(), True),
             StructField("position",  StringType(), True),
             StructField("status",    StringType(), True),
         ])), True)

   ])

JSON看起来像这样：

[{ "player_name" : "John", "team_history" : [ { "team" : "Rangers", "position" : "Center", "status" : "Active" }, { "team" : "Blackhawks", "position" : "Center", "status" : "Former"}, { "team" : "Kings", "position" : "Center", "status" : "Former"} ] },

{ "player_name" : "Bob", "team_history" : [ { "team" : "Devils", "position" : "Defense", "status" : "Active" }, { "team" : "Maple Leafs", "position" : "Defence", "status" : "Former"}, { "team" : "Canadiens", "position" : "Defense", "status" : "Former"} ] }]

我想“分解”team_history列的内容，以创建一个名为df_exploded的新框架，其中列只包含team和status**，如下所示：

team          status
Rangers       Active
Blackhawks    Former
Kings         Former
Devils        Active
Maple Leafs   Former
Canadiens     Former

如何使用Pyspark中的explode()函数创建所需的df_exploded框架？
谢谢你，谢谢

pyspark

来源：https://stackoverflow.com/questions/77225668/how-to-extract-selected-elements-from-array-using-explode-in-pyspark

1条答案

按热度按时间

enyaitl31#

使用explode并从结构体中提取您感兴趣的值似乎可以做到这一点：

df\
  .select(F.explode("team_history").alias("s"))\
  .select("s.team", "s.status")\
  .show()

+-----------+------+
|team       |status|
+-----------+------+
|Rangers    |Active|
|Blackhawks |Former|
|Kings      |Former|
|Devils     |Active|
|Maple Leafs|Former|
|Canadiens  |Former|
+-----------+------+

赞(0）回复(0）举报 2023-10-15

我来回答

如何在Pyspark中使用“explode”从数组中提取选定的元素

1条答案

相关问题

热门标签

最新问答