pyspark-在嵌套数组中强制转换列

suzh9iv8  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(378)

我有一个具有以下模式的Dataframe:

root
 |-- Id: long (nullable = true)
 |-- LastUpdate: string (nullable = true)
 |-- Info: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Purchase: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- Amount: long (nullable = true)
 |    |    |    |    |-- Name: string (nullable = true)
 |    |    |    |    |-- Type: string (nullable = true)

如何选择 Amount 这样我就可以投了?

Tried:

df = df.withColumn("Info.Purchase.Amount", df["Info.Purchase.Amount"].cast(DoubleType()))

But got:

org.apache.spark.sql.AnalysisException: cannot resolve '`Info`.`Purchase`['Amount']'
ubby3x7f

ubby3x7f1#

您可以使用以下方法提取嵌套数组:

df.select(col("info").getField("Purchase").getField("Amount")).show()

这将为您提供列中所有金额的列表。你可以投那个。

相关问题