如何在pyspark模式中迭代嵌套数组?

khbbv19g  于 2023-02-18  发布在  Spark
关注(0)|答案(1)|浏览(212)

当前我的架构是:

root
 |-- C_0_0: double (nullable = true)
 |-- C_0_1: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: **double** (containsNull = true)
 |-- C_0_2: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: double (containsNull = true)

我想将其更改为:

root
 |-- C_0_0: double (nullable = true)
 |-- C_0_1: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: **decimal(8,6)** (containsNull = true)
 |-- C_0_2: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: double (containsNull = true)

既然数组的子元素没有字段名,我如何迭代嵌套数组?

kqqjbcuj

kqqjbcuj1#

你不需要迭代,只需要使用类型转换。
这是可行的:

from pyspark.sql import functions as F
from pyspark.sql.types import ArrayType, DecimalType

df=df.withColumn("ArrayOfDoub", F.col("C_0_1").cast(ArrayType(ArrayType(DecimalType(8,6)))))

输入:

输出:

相关问题