我试图检查是否有任何方法来查看Dataframe中是否存在特定的列,并使用javaspark进行检查。我搜索了一些关于python的建议,但没有任何关于java的建议。
我正在从mongo中提取这些数据,并尝试检查某些列是否存在。mongo db中没有可用于此表的架构验证。
下面是我的模式,我想检查它们是否与我的列配置一起存在。
|-- _id: string (nullable = true)
|-- value: struct (nullable = true)
| |-- acctId: string (nullable = true)
| |-- conId: string (nullable = true)
| |-- dimensions: struct (nullable = true)
| | |-- device: struct (nullable = true)
| | | |-- accountId: long (nullable = true)
| | | |-- addFreeTitleTime: timestamp (nullable = true)
| | | |-- build: string (nullable = true)
| | | |-- country: string (nullable = true)
| | | |-- countryOfResidence: string (nullable = true)
| | | |-- createDate: timestamp (nullable = true)
| | | |-- number: string (nullable = true)
| | | |-- FamilyName: string (nullable = true)
| | | |-- did: long (nullable = true)
| | | |-- deviceToken: string (nullable = true)
| | | |-- initialBuildNumber: string (nullable = true)
| | | |-- language: string (nullable = true)
| | | |-- major: integer (nullable = true)
| | | |-- minor: integer (nullable = true)
| | | |-- model: string (nullable = true)
| | | |-- modelDesc: string (nullable = true)
| | | |-- modelId: string (nullable = true)
| | | |-- modifyDate: timestamp (nullable = true)
| | | |-- preReg: integer (nullable = true)
| | | |-- retailer: string (nullable = true)
| | | |-- serialNumber: string (nullable = true)
| | | |-- softwareUpdateDate: timestamp (nullable = true)
| | | |-- softwareVersion: string (nullable = true)
| | | |-- sourceId: string (nullable = true)
| | | |-- timeZone: string (nullable = true)
| | |-- location: struct (nullable = true)
你的意见和建议将很有价值。
提前谢谢
2条答案
按热度按时间92dk7w1h1#
或
fkvaft9z2#
是的,您可以在java中通过获取数据集的所有列并检查您想要的列是否存在来实现这一点。举个例子:
这里我使用了一个非常简单的方法来检查列名是否存在于列数组中,您可以使用任何其他方法来执行此检查。