如何使用array_remove Spark SQL内置函数删除空值

wnvonmuf 于 2023-02-24 发布在 Apache

关注(0)|答案(5)|浏览(381)

Spark 2.4引入了新的有用的Spark SQL函数，涉及数组，但我有点困惑，当我发现的结果：select array_remove(array(1, 2, 3, null, 3), null)是null而不是[1, 2, 3, 3].
这是预期的行为吗？是否可以使用array_remove删除空值？
顺便说一句，目前我使用的替代方法是数据块中的高阶函数：
select filter(array(1, 2, 3, null, 3), x -> x is not null)

apache-spark

来源：https://stackoverflow.com/questions/54159964/how-to-remove-nulls-with-array-remove-spark-sql-built-in-function

5条答案

按热度按时间

ffx8fchx1#

要回答你的第一个问题，这是一个预期的行为吗？，是的.因为官方笔记本（https://docs.databricks.com/_static/notebooks/apache-spark-2.4-functions.html）指出“从给定数组中删除所有等于给定元素的元素.”并且NULL对应于未定义的值&结果也不会被定义.
因此，我认为NULL超出了此函数的范围。
最好您找到了一种方法来克服这个问题，您也可以使用spark.sql("""SELECT array_except(array(1, 2, 3, 3, null, 3, 3,3, 4, 5), array(null))""").show()，但缺点是结果将没有重复。

赞(0）回复(0）举报 2023-02-24

5t7ly7z52#

你可以在Spark 2中这样做：

import org.apache.spark.sql.functions._
import org.apache.spark.sql._

/**
  * Array without nulls
  * For complex types, you are responsible for passing in a nullPlaceholder of the same type as elements in the array
  */
def non_null_array(columns: Seq[Column], nullPlaceholder: Any = "רכוב כל יום"): Column =
  array_remove(array(columns.map(c => coalesce(c, lit(nullPlaceholder))): _*), nullPlaceholder)

在Spark 3中，有新的数组过滤功能，您可以：

df.select(filter(col("array_column"), x => x.isNotNull))

赞(0）回复(0）举报 2023-02-24

zlhcx6iw3#

https://docs.databricks.com/_static/notebooks/apache-spark-2.4-functions.html
数组移除（数组，T）：从给定数组中删除所有等于给定元素的元素。
注意：我只是参考了文档，他们采用了相同的数据。**null永远不能等于null。

赞(0）回复(0）举报 2023-02-24

hwazgwia4#

我不认为你可以使用array_remove（）或array_except（）来解决你的问题。然而，尽管这不是一个很好的解决方案，但它可能会有所帮助。

@F.udf("array<string>")
def udf_remove_nulls(arr):
    return [i for i in arr if i is not None]

df = df.withColumn("col_wo_nulls", udf_remove_nulls(df["array_column"]))

赞(0）回复(0）举报 2023-02-24

4nkexdtk5#

如果您还想去除重复项，并只返回一次每个不同的非NULL值，则可以使用array_except：

f.array_except(f.col("array_column_with_nulls"), f.array(f.lit(None)))

或者类似的SQL，如下所示：

array_except(your_array_with_NULLs, array())

赞(0）回复(0）举报 2023-02-24

我来回答

如何使用array_remove Spark SQL内置函数删除空值

5条答案

相关问题

热门标签

最新问答