如何使用Scala在Spark中分解嵌套结构

gk7wooem 于 2023-05-29 发布在 Scala

关注(0)|答案(1)|浏览(156)

我正在研究一个Databricks的例子。dataframe的模式如下所示：

|-- authors: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- author: struct (nullable = true)
 |    |    |    |-- key: string (nullable = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- type: string (nullable = true)

我尝试创建如下dataframe模式

|-- author_key: string (nullable = true)
 |-- key: string (nullable = true)
 |-- type: string (nullable = true)

我不知道如何分解嵌套的结构体，所以我只想使用explode首先输入rows，但我不确定这是正确的方法。下面的屏幕截图是我的代码的结果。

scala

来源：https://stackoverflow.com/questions/76351898/how-to-explode-a-nested-struct-in-spark-using-scala

1条答案

按热度按时间

r8uurelv1#

你可以使用explode函数来分解数组，然后在单独的列中提取所需的数据，类似于这样：

import org.apache.spark.sql.functions.explode

val explodedDf = df.select(explode($"authors").alias("elem"))
val result = explodedDf
            .withColumn("author_key", $"elem.author.key")
            .withColumn("key", $"elem.key")
            .withColumn("type", $"elem.type")

赞(0）回复(0）举报 2023-05-29

我来回答

如何使用Scala在Spark中分解嵌套结构

1条答案

相关问题

热门标签

最新问答