javaspark中是否存在列

jljoyd4f  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(357)

我试图检查是否有任何方法来查看Dataframe中是否存在特定的列,并使用javaspark进行检查。我搜索了一些关于python的建议,但没有任何关于java的建议。
我正在从mongo中提取这些数据,并尝试检查某些列是否存在。mongo db中没有可用于此表的架构验证。
下面是我的模式,我想检查它们是否与我的列配置一起存在。

|-- _id: string (nullable = true)
 |-- value: struct (nullable = true)
 |    |-- acctId: string (nullable = true)
 |    |-- conId: string (nullable = true)
 |    |-- dimensions: struct (nullable = true)
 |    |    |-- device: struct (nullable = true)
 |    |    |    |-- accountId: long (nullable = true)
 |    |    |    |-- addFreeTitleTime: timestamp (nullable = true)
 |    |    |    |-- build: string (nullable = true)
 |    |    |    |-- country: string (nullable = true)
 |    |    |    |-- countryOfResidence: string (nullable = true)
 |    |    |    |-- createDate: timestamp (nullable = true)
 |    |    |    |-- number: string (nullable = true)
 |    |    |    |-- FamilyName: string (nullable = true)
 |    |    |    |-- did: long (nullable = true)
 |    |    |    |-- deviceToken: string (nullable = true)
 |    |    |    |-- initialBuildNumber: string (nullable = true)
 |    |    |    |-- language: string (nullable = true)
 |    |    |    |-- major: integer (nullable = true)
 |    |    |    |-- minor: integer (nullable = true)
 |    |    |    |-- model: string (nullable = true)
 |    |    |    |-- modelDesc: string (nullable = true)
 |    |    |    |-- modelId: string (nullable = true)
 |    |    |    |-- modifyDate: timestamp (nullable = true)
 |    |    |    |-- preReg: integer (nullable = true)
 |    |    |    |-- retailer: string (nullable = true)
 |    |    |    |-- serialNumber: string (nullable = true)
 |    |    |    |-- softwareUpdateDate: timestamp (nullable = true)
 |    |    |    |-- softwareVersion: string (nullable = true)
 |    |    |    |-- sourceId: string (nullable = true)
 |    |    |    |-- timeZone: string (nullable = true)
 |    |    |-- location: struct (nullable = true)

你的意见和建议将很有价值。
提前谢谢

92dk7w1h

92dk7w1h1#

sourceDF.printSchema
//  root
//  |-- category: string (nullable = true)
//  |-- tags: string (nullable = true)
//  |-- datetime: string (nullable = true)
//  |-- date: string (nullable = true)

  val cols = sourceDF.columns
//  cols: Array[String] = Array(category, tags, datetime, date)

  val IsFieldCategory = cols.filter(_ == "category")
//  IsFieldCategory: Array[String] = Array(category)

val isFieldTags = sourceDF.columns.contains("tags")
//  isFieldTags: Boolean = true
fkvaft9z

fkvaft9z2#

是的,您可以在java中通过获取数据集的所有列并检查您想要的列是否存在来实现这一点。举个例子:

Dataset<Object1> dataSet = spark.read().text("dataPath").as(Encoders.bean(Object1.class)); //load data in dataset
String[] columns = dataSet.columns(); // fetch all column names
System.out.println(Arrays.toString(columns).contains("columnNameToCheckFor")); //check if the column name we want to check exist in the array of columns.

这里我使用了一个非常简单的方法来检查列名是否存在于列数组中,您可以使用任何其他方法来执行此检查。

相关问题