我想用filter方法从Dataframe中过滤出一些记录。我有一个struct地址数组,我正在与一个列值进行比较。我使用以下代码:
entityJoinB_df.filter(col("addressstructm.streetName").cast(StringType) =!= (col("streetName")))
我想基于比较从地址结构中删除元素。示例架构如下:
root
|-- apartmentnumber: string (nullable = true)
|-- streetName: string (nullable = true)
|-- streetName2: string (nullable = true)
|-- fullName: string (nullable = false)
|-- address: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- streetName: string (nullable = true)
| | |-- streetName2: string (nullable = true)
| | |-- buildingName: string (nullable = true)
| | |-- type: string (nullable = true)
| | |-- city: string (nullable = true)
|-- isActive: boolean (nullable = false)
但它不起作用。有什么问题。有人能帮忙吗?
样本输入:
[
{
"apartmentnumber": 122,
"streetName": "ABC ABC",
"streetName2": "CBD",
"fullName": "MR. X"
"address": [{
"streetName": "ABC ABC",
"streetName2": "CBD",
"buildingName": "ONE",
"city":"NY"
},
{
"streetName": "XYZ ABC",
"streetName2": "XCB",
"buildingName": "ONE",
"city":"NY"
}]
}
]
样本输出:
{
"apartmentnumber": 122,
"streetName": "ABC ABC",
"streetName2": "CBD",
"fullName": "MR. X"
"address": [
{
"streetName": "XYZ ABC",
"streetName2": "XCB",
"buildingName": "ONE",
"city":"NY"
}]
}
]
谢谢,乌本
2条答案
按热度按时间pxiryf3j1#
我认为你的问题可以通过修改过滤器表达式来解决
假设
addressstructm
是Dataframe的别名下面的示例结构与您的类似
abithluo2#
试试下面的代码。