如何在Azure数据库中使用PySpark重命名struct的第一级键？

rvpgvaaj 于 2022-11-01 发布在 Spark

关注(0)|答案(1)|浏览(140)

我想重命名我的有效负载内的第一级对象的键。

from pyspark.sql.functions import *  
ds = {'Fruits': {'apple': {'color': 'red'},'mango': {'color': 'green'}}, 'Vegetables': None}
df = spark.read.json(sc.parallelize([ds]))
df.printSchema()
"""
root
 |-- Fruits: struct (nullable = true)
 |    |-- apple: struct (nullable = true)
 |    |    |-- color: string (nullable = true)
 |    |    |-- shape: string (nullable = true)
 |    |-- mango: struct (nullable = true)
 |    |    |-- color: string (nullable = true)
 |-- Vegetables: string (nullable = true)
"""

所需输出：

root
 |-- Fruits: struct (nullable = true)
 |    |-- APPLE: struct (nullable = true)
 |    |    |-- color: string (nullable = true)
 |    |    |-- shape: string (nullable = true)
 |    |-- MANGO: struct (nullable = true)
 |    |    |-- color: string (nullable = true)
 |-- Vegetables: string (nullable = true)

在这种情况下，我想将第一级中的键重命名为大写。
如果我有一个贴图类型，我可以使用变换关键点：

df.select(transform_keys("Fruits", lambda k, _: upper(k)).alias("data_upper")).display()

不幸的是，我有一个结构类型。
分析异常：由于参数数据类型不匹配，无法解析“transform_keys（Fruits，lambdfunction（upper（x_18），x_18，y_19））”：参数1需要Map类型，但是，“Fruits”是structapple：struct〈color：string，shape：string，mango：structcolor：string〉类型的。
我使用的是数据库运行时10.4 LTS（包括Apache Spark 3.2.1、Scala 2.12）。

pyspark

来源：https://stackoverflow.com/questions/74052861/how-to-rename-the-first-level-keys-of-struct-with-pyspark-in-azure-databricks

1条答案

按热度按时间

5gfr0r5j1#

您尝试使用的函数（transform_keys）用于 map 类型的列。您的列类型为 struct。
您可以使用withField。

from pyspark.sql import functions as F
ds = spark.createDataFrame([], 'Fruits struct<apple:struct<color:string,shape:string>,mango:struct<color:string>>, Vegetables string')
ds.printSchema()

# root

# |-- Fruits: struct (nullable = true)

# |    |-- apple: struct (nullable = true)

# |    |    |-- color: string (nullable = true)

# |    |    |-- shape: string (nullable = true)

# |    |-- mango: struct (nullable = true)

# |    |    |-- color: string (nullable = true)

# |-- Vegetables: string (nullable = true)

ds = ds.withColumn('Fruits', F.col('Fruits').withField('APPLE', F.col('Fruits.apple')))
ds = ds.withColumn('Fruits', F.col('Fruits').withField('MANGO', F.col('Fruits.mango')))

ds.printSchema()

# root

# |-- Fruits: struct (nullable = true)

# |    |-- APPLE: struct (nullable = true)

# |    |    |-- color: string (nullable = true)

# |    |    |-- shape: string (nullable = true)

# |    |-- MANGO: struct (nullable = true)

# |    |    |-- color: string (nullable = true)

# |-- Vegetables: string (nullable = true)

您也可以重新建立结构，但重新建立时必须包含所有的结构字段。

ds = ds.withColumn('Fruits', F.struct(
    F.col('Fruits.apple').alias('APPLE'),
    F.col('Fruits.mango').alias('MANGO'),
))

ds.printSchema()

# root

# |-- Fruits: struct (nullable = true)

# |    |-- APPLE: struct (nullable = true)

# |    |    |-- color: string (nullable = true)

# |    |    |-- shape: string (nullable = true)

# |    |-- MANGO: struct (nullable = true)

# |    |    |-- color: string (nullable = true)

# |-- Vegetables: string (nullable = true)

赞(0）回复(0）举报 2022-11-01

我来回答

如何在Azure数据库中使用PySpark重命名struct的第一级键？

1条答案

相关问题

热门标签

最新问答