在我的 Dataframe 中,我需要将数组类型的列转换为字符串,而不丢失列中数据的元素名称/模式。
我的 Dataframe 架构:
root
|-- accountId: string (nullable = true)
|-- documents: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- accountId: string (nullable = true)
| | |-- agreementId: string (nullable = true)
| | |-- createdBy: string (nullable = true)
| | |-- createdDate: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- obligations: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- resourceVersion: long (nullable = true)
| | |-- updatedBy: string (nullable = true)
| | |-- updatedDate: string (nullable = true)
Dataframe示例数据(我以JSON格式显示它,但它是Spark dataframe中的列):
{
"accountId":"1",
"documents":{
"list":[{
"element":{
"accountId":"1",
"agreementId":"1.2",
"createdDate":"2022-10-06T19:33:42.539646Z",
"externalId":"16",
"id":"123",
"name":"test1.docx",
"obligations":{},
"resourceVersion":1,
"updatedDate":"2022-10-06T19:33:42.680233Z"
}
}]
}
}
{
"accountId":"2",
"documents":{
"list":[{
"element":{
"accountId":"2",
"agreementId":"2.2",
"createdDate":"2022-10-06T19:33:42.539646Z",
"externalId":"18",
"id":"123",
"name":"test2.docx",
"obligations":{},
"resourceVersion":1,
"updatedDate":"2022-10-06T19:33:42.680233Z"
}
}]
}
}
我的当前代码:
df_string = df.select([col(c).cast("string") for c in df.columns])
它可以做什么(列名在文档中消失):
{
"accountId":"1",
"documents":[{"1","1.2","2022-10-06T19:33:42.539646Z","16",:"123","test1.docx","",1,"2022-10-06T19:33:42.680233Z"}]
}
{
"accountId":"2",
"documents":[{"2","2.2","2022-10-06T19:33:42.539646Z","18","123","test2.docx","","1","2022-10-06T19:33:42.680233Z"}]
}
我需要完成的工作(文件中必须保留数据行名称):
{
"accountId":"1",
"documents":[{"accountId":"1","agreementId":"1.2","createdDate":"2022-10-06T19:33:42.539646Z","externalId":"16","id":"123","name":"test1.docx","obligations":"","resourceVersion":"1","updatedDate":"2022-10-06T19:33:42.680233Z"}]
}
{
"accountId":"2",
"documents":[{"accountId":"2","agreementId":"2.2","createdDate":"2022-10-06T19:33:42.539646Z","externalId":"18","id":"123","name":"test2.docx","obligations":"","resourceVersion":"1","updatedDate":"2022-10-06T19:33:42.680233Z"}]
}
暂无答案!
目前还没有任何答案,快来回答吧!