我有以下JSON数据,我用df = spark.read.json('file')
读取这些数据。我想在json中获取值“9ac 2b 5 fc-d2 c5 - 43 a8-a9 e6 - 244 f02 b 93997”,并创建一个列“CustomerId”。
{
"Statements": {
"9ac2b5fc-d2c5-43a8-a9e6-244f02b93997": {
"Accounts": [
{
"Id": 12345678,
"Institution": "Bank Name",
"Name": "Savings Name",
"AccountNumber": "00000000",
"Bsb": "000000",
"CurrentBalance": "0",
"Available": "0",
"AccountHolder": "A",
"AccountAddress": null,
"AccountType": "SAVINGS",
"OpeningBalance": "0.0",
"ClosingBalance": "0.0"
}
]
}
}
}
目前已设法通过以下方式获得帐户部分
cols = ["id"]
df_statement = df.select("Statements.*").toDF(*cols)
df_statement = df_statement.withColumn("accounts", explode("id.Accounts"))
df_statement.select("accounts.Institution", "accounts.Bsb", "accounts.AccountNumber").show()
其返回
+------------+--------+-------------+
| Institution| Bsb |AccountNumber|
+------------+--------+-------------+
| Bank Name | 000000 | 00000000 |
我想要它,所以它返回这样的东西。谢谢!
+--------------------------------------+-----------+------------+-------------+
| CustomerId |Institution| Bsb |AccountNumber|
+--------------------------------------+-----------+------------+-------------+
| 9ac2b5fc-d2c5-43a8-a9e6-244f02b93997 | Bank Name | 000000 | 00000000 |
1条答案
按热度按时间2izufjch1#
我不确定这个解决方案是否适合你,但是请使用我自己的托管json文件和你提供的数据来检查它
输出: