hive查询转换成spark?

flseospp  于 2021-07-13  发布在  Spark
关注(0)|答案(0)|浏览(268)

我是spark的新手,我们在spark数据框架中执行所有的配置单元语句,我们可以使用spark.sql运行整个sql语句,但是我们需要使用spark转换
配置单元查询

select REGEXP_REPLACE(nid,'"','') as hash,REGEXP_REPLACE(emailnew,'"','') as email,REGEXP_REPLACE(employer1website,'"','') as domain,REGEXP_REPLACE(city,'"','') as city,
REGEXP_REPLACE(state,'"','') as state,REGEXP_REPLACE(email_domain,'"','') as emaildomain
from mytestdb.cookiedata_cleansed
where (REGEXP_REPLACE(nid,'"','')) in (select (REGEXP_REPLACE(md5,'"','')) from mytestdb.ckg_distincthash_01dec)
group by REGEXP_REPLACE(nid,'"',''),REGEXP_REPLACE(emailnew,'"',''),REGEXP_REPLACE(employer1website,'"',''),REGEXP_REPLACE(city,'"',''),
REGEXP_REPLACE(state,'"',''),REGEXP_REPLACE(email_domain,'"','');

我能做到这一点,我们如何在这里使用第二个df,以及regex的where条件。

val df3 =  df.withColumn("nid",regexp_replace(col("nid"),'"','').alias("hash"))
      .withColumn("emailnew",regexp_replace(col("emailnew"),'"','').alias("email"))
      .withColumn("employer1website",regexp_replace(col("employer1website"),'"','').alias("domain"))
      .withColumn("city",regexp_replace(col("city"),'"','').alias("city"))

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题