如何在pyspark中使用coalesce将空值替换为某个值

jvlzgdj9  于 2023-08-02  发布在  Spark
关注(0)|答案(1)|浏览(179)

我有两个文件:- orders_renamed.csv,customers.csv我用完全的外部连接连接它们,然后删除相同的列(customer_id)。我想在“order_id”列中将null值替换为“-1”。
我试过这个:

from pyspark.sql.functions import regexp_extract, monotonically_increasing_id, unix_timestamp, from_unixtime, coalesce from pyspark.sql.types import IntegerType, StructField, StructType, StringType

ordersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/orders_renamed.csv").load()

customersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/customers.csv").load()

joinCondition1 = ordersDf.customer_id == customersDf.customer_id

joinType1 = "outer"   

joinenullreplace = ordersDf.join(customersDf, joinCondition1, joinType1).drop(ordersDf.customer_id).select("order_id", "customer_id", "customer_fname").sort("order_id").withColumn("order_id",coalesce("order_id",-1))

joinenullreplace.show(50)

字符串
正如在最后一行,我已经使用了聚结,但它给了我错误..我已经尝试了多种方法,如treting聚结作为一个表达式和应用'expr',但它没有工作。我也用过lit,但没有用。请回复解决方案。

klr1opcd

klr1opcd1#

from pyspark.sql.functions import lit

字符串

相关问题