仅从pysparkDataframe中获取那些不为null的字段名

8cdiaqws 于 2021-07-13 发布在 Spark

关注(0)|答案(1)|浏览(442)

我有一个Pypark数据框 df1 . 其printschema（）如下所示。

df1.printSchema()

root
 |-- parent: struct (nullable = true)
 |    |-- childa: struct (nullable = true)
 |    |    |-- x: string (nullable = true)
 |    |    |-- y: string (nullable = true)
 |    |    |-- z: string (nullable = true)
 |    |-- childb: struct (nullable = true)
 |    |    |-- x: string (nullable = true)
 |    |    |-- y: string (nullable = true)
 |    |    |-- z: string (nullable = true)
 |    |-- childc: struct (nullable = true)
 |    |    |-- x: string (nullable = true)
 |    |    |-- y: string (nullable = true)
 |    |    |-- z: string (nullable = true)
 |    |-- childd: struct (nullable = true)
 |    |    |-- x: string (nullable = true)
 |    |    |-- y: string (nullable = true)
 |    |    |-- z: string (nullable = true)

df1.show(10,False)

----------------------------------------------------------------
|parent                                                        |
----------------------------------------------------------------
|[,[x_value, y_value, z_value], ,[x_value, y_value, z_value]]  |
----------------------------------------------------------------

这个 df1.show() 显示childb和childd不为null。
我可以得到所有的子结构字段名，比如（childa，childb，childc，childd）。
而且我只想得到那些不为null的子结构字段名。
下面的方法是将所有子结构字段名放入一个列表中，这满足了我上面的第一个要求。

spark.sql("""select parent.* from df1""").schema.fieldNames()
Output:
[childa, childb, childc, childd]

现在我只想得到那些不为null的子结构字段名。我只希望childb和childd进入一个列表。
预期产量： [childb, childd] 谁能帮我一下吗。

apache-spark pyspark apache-spark-sql pyspark-dataframes struct

来源：https://stackoverflow.com/questions/66115838/get-only-those-field-names-which-are-not-null-from-a-pyspark-dataframe

1条答案

按热度按时间

laawzig21#

您可以使用过滤器和计数检查字段是否为空：

non_null_fields = [
    field
    for field in df.select('parent.*').schema.fieldNames()
    if df.filter('parent.%s is null' % field).count() == 0
]

这给了

['childb', 'childd']

赞(0）回复(0）举报 2021-07-13

我来回答

仅从pysparkDataframe中获取那些不为null的字段名

1条答案

相关问题

热门标签

最新问答