如何检查一行的多列值是否为空，然后在spark scala中添加真/假结果列

kupeojn6 于 2021-05-26 发布在 Spark

关注(0)|答案(1)|浏览(460)

嗨，怎么样？以下是我的两个Dataframe：

val id_df = Seq(("1","gender"),("2","city"),("3","state"),("4","age")).toDF("id","type")

  val main_df = Seq(("male","los angeles","null"),("female","new york","new york")).toDF("1","2","3")

下面是它们的表格形式：

这就是我想要的结果Dataframe的样子：

我想检查id_df中的所有id，如果它们存在于main_df的列中，那么检查该行的所有id值是否都不为null。如果它们都不为null，那么我们在该行的meets condition列中输入“true”，否则输入“false”。注意，age的id数字4不在main的列中，所以我们忽略它。
我该怎么做？
非常感谢，祝你今天愉快。

scala apache-spark

来源：https://stackoverflow.com/questions/64029726/how-to-check-whether-multiple-columns-values-of-a-row-are-not-null-and-then-add

1条答案

按热度按时间

uemypmqf1#

请允许我从两个简短的观察开始：
我相信避免用单个数字来命名我们的列会更安全。想想我们需要计算表达式的情况 1 is not null . 在这里，我们的意思是 column 1 或者 value 1 它自己。
据我所知，通过Dataframe存储和处理目标列是不可行的。这将创建一个开销，可以通过使用单个scala集合（即seq、array、set等）轻松避免。
下面是解决问题的方法：

import org.apache.spark.sql.functions.col

val id_df = Seq(
  ("c1","gender"),
  ("c2","city"),
  ("c3","state"),
  ("c4","age")
).toDF("id","type")

val main_df = Seq(
    ("male", "los angeles", null),
    ("female", "new york", "new york"),
    ("trans", null, "new york")
).toDF("c1","c2","c3")

val targetCols = id_df.collect()
                      .map{_.getString(0)} //get id
                      .toSet //convert current sequence to a set (required for the intersection)
                      .intersect(main_df.columns.toSet) //get common columns with main_df
                      .map(col(_).isNotNull) //convert c1,..cN to col(c[i]).isNotNull
                      .reduce(_ && _) // apply the AND operator between items

// (((c1 IS NOT NULL) AND (c2 IS NOT NULL)) AND (c3 IS NOT NULL))

main_df.withColumn("meets_conditions", targetCols).show(false)

// +------+-----------+--------+----------------+
// |c1    |c2         |c3      |meets_conditions|
// +------+-----------+--------+----------------+
// |male  |los angeles|null    |false           |
// |female|new york   |new york|true            |
// |trans |null       |new york|false           |
// +------+-----------+--------+----------------+

赞(0）回复(0）举报 2021-05-26

我来回答

如何检查一行的多列值是否为空，然后在spark scala中添加真/假结果列

1条答案

相关问题

热门标签

最新问答