spark dataframe:hivemax(case)语句的可选scala代码

de90aj5v  于 2021-06-28  发布在  Hive
关注(0)|答案(2)|浏览(413)

我有四列数据框(id int,name string,mobile string,phone string)
我需要另一种方法来实现配置单元查询到scala代码中的逻辑。
配置单元查询是:

SELECT id AS member_id
,max(CASE WHEN name = 'Mrs.' THEN mobile ELSE NULL END) AS mobile
,max(CASE WHEN name = 'Dr.' THEN phone ELSE NULL END) AS phone
from temp1
group by id;

谢谢。

kcrjzv8t

kcrjzv8t1#

你可以写:

dataFrame.registerTempTable("temp1")
val result = sqlContext.sql (here put same SQL as in question)

或在spark 2.0中:

dataset.createTempView("temp1")
val result = sparkSession.sql(here put same SQL as in question)

或者,可以使用数据集api:

val mobileUDF = udf {
    (name : String, mobile : String) => if (name == "Mrs.") mobile else null;
}
val phoneUDF = udf {
    (name : String, phone: String) => if (name == "Mrs.") phone else null;
}

dataset.withColumn("newMobile", mobileUDF($"name", $"mobile"))
    .withColumn("newPhone", phoneUDF($"name", $"phone"))
    .groupBy($"id")
    .agg(max(col("newMobile")), max(col("newPhone")))
lc8prwob

lc8prwob2#

尝试:

df.groupBy('id).agg(
  max(when('name === "Mrs.", 'mobile)).alias("mobile"),
  max(when('name === "Dr.", 'phone)).alias("phone")
)

相关问题