如何使用Spark访问多嵌套DataFrame的值

cvxl0en2 于 2023-05-01 发布在 Apache

关注(0)|答案(1)|浏览(159)

我有以下DataFrame模式：

root
 |-- event: struct (nullable = false)
 |    |-- code: string (nullable = true)
 |    |-- idEvent: string (nullable = true)
 |    |-- contract: struct (nullable = false)
 |    |    |-- idApplication: string (nullable = true)
 |    |-- version: struct (nullable = false)
 |    |    |-- idVersion: string (nullable = true)
 |    |    |-- entity: array (nullable = false)
 |    |    |    |-- element: struct (containsNull = false)
 |    |    |    |    |-- idEntity: string (nullable = true)
 |    |    |    |    |-- entityType: array (nullable = false)
 |    |    |    |    |    |-- element: struct (containsNull = false)
 |    |    |    |    |    |    |-- entityNumber: string (nullable = true)
 |    |    |    |    |    |    |-- entityVersion: array (nullable = false)
 |    |    |    |    |    |    |    |-- element: struct (containsNull = false)
 |    |    |    |    |    |    |    |    |-- entityCode: string (nullable = true)
 |    |    |    |    |    |    |    |    |-- idCode: string (nullable = true)

从这个模式中，我们得到了一个由单个列组成的DataFrame：event .
我希望检索位于entityVersion数组中的idCode属性。您知道是否有一种方法可以检索idCode值，而不必使用explode之类的方法来扁平化整个DataFrame吗？
非常感谢！

apache-spark

来源：https://stackoverflow.com/questions/76121811/how-to-access-to-the-value-of-a-multi-nested-dataframe-using-spark

1条答案

按热度按时间

4nkexdtk1#

首先使用df变量加载json，然后像下面的代码一样使用select

val df1 = df.select("event.version.entity.entityType.entityVersion.idCode")
df1.show()

不需要取整个数据或explode列。

赞(0）回复(0）举报 2023-05-01

我来回答

如何使用Spark访问多嵌套DataFrame的值

1条答案

相关问题

热门标签

最新问答