我有一个Dataframe,它有以下模式。这个 translation_version
场下 translations --> languages (no, pt,...)
列位于 null
. 我想把所有的 translation_version
作为字符串。我有17种语言 translations
```
root
|-- translations: struct (nullable = true)
| |-- no: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true) // Want to cast as string
| |-- pt: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true)
| |-- fr: struct (nullable = true)
| | |-- Description: string (nullable = true)
| | |-- class: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- translation_version: null (nullable = true)
我试过了 `df = df.na.fill('null')` 但没有改变什么。还尝试用以下代码强制转换
df = df.withColumn("translations", F.col("translations").cast("struct<struct<translation_version: string>>"))
但这返回了以下错误
pyspark.sql.utils.ParseException: u"\nmismatched input '<' expecting ':'(line 1, pos 13)\n\n== SQL ==\nstruct<struct<translation_version: string>>\n-------------^^^\n"
你知道怎么投吗 `translation_version` 作为每种语言的字符串?
1条答案
按热度按时间62lalag41#
这应该能奏效