pyspark:将nulltype字段转换为struct-type列下的字符串

avwztpqn  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(381)

我有一个Dataframe,它有以下模式。这个 translation_version 场下 translations --> languages (no, pt,...) 列位于 null . 我想把所有的 translation_version 作为字符串。我有17种语言 translations ```
root
|-- translations: struct (nullable = true)
| |-- no: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true) // Want to cast as string
| |-- pt: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true)
| |-- fr: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true)

我试过了 `df = df.na.fill('null')` 但没有改变什么。还尝试用以下代码强制转换

df = df.withColumn("translations", F.col("translations").cast("struct<struct<translation_version: string>>"))

但这返回了以下错误

pyspark.sql.utils.ParseException: u"\nmismatched input '<' expecting ':'(line 1, pos 13)\n\n== SQL ==\nstruct<struct<translation_version: string>>\n-------------^^^\n"

你知道怎么投吗 `translation_version` 作为每种语言的字符串?
62lalag4

62lalag41#

这应该能奏效

from pyspark.sql.functions import col, struct
from pyspark.sql.types import StructType, StructField, StringType

schema_ = StructType([StructField("Description",StringType(),True),
                      StructField("class",StringType(),True),
                      StructField("description",StringType(),True),
                      StructField("translation_version",StringType(),True)
                     ]
                    )

df_1 = (
    df
    .select("translations.*")
    .withColumn("translations", struct(
        col("fr").cast(schema).alias("fr"),
        col("pt").cast(schema).alias("pt"),
        col("no").cast(schema).alias("no")
               )
               )
    .drop("fr", "pt", "no")
)

相关问题