spark:如何在横向视图分解中包含空行

66bbxpm5  于 2021-06-27  发布在  Hive
关注(0)|答案(3)|浏览(367)

我有一张table如下:

user_id email
u1      e1, e2
u2      null

我的目标是将其转换为以下格式:

user_id email
u1      e1
u1      e2
u2      null

配置单元sql:从表横向视图中选择*分解(拆分(email,,')email作为email\u id
当上面的查询在hive中执行时,我得到的是空值,但是当同样的查询在sparksql中运行时,我没有得到空值,这个问题和场景已经在这里讨论过了
Spark sql :
select * FROM table LATERAL VIEW OUTER explode ( split ( email ,',' ) ) email AS email_id; select * from table lateral view POSEXPLODE_OUTER(split(email,',')) email as email_id <br> 第二个是语法问题失败,我尝试用posexplode\u outer搜索横向视图,但没有得到太多结果,我想在sparksql中引入null。

s3fp2yjn

s3fp2yjn1#

spark 2.2.0增加了外侧视图
例如 scala> spark.sql( | "select * FROM table LATERAL VIEW OUTER explode ( split ( email ,',' ) ) email AS email_id" | ).show +-------+------+--------+ |user_id| email|email_id| +-------+------+--------+ | u1|e1, e2| e1| | u1|e1, e2| e2| | u2| null| null| +-------+------+--------+

bxjv4tth

bxjv4tth2#

sparksql不使用hiveql。它与它部分兼容,但不应该弄错。而不是使用 LATERAL VIEW 你应该用 SELECT ```
Seq(
("u1", "e1, e2"), ("u2", null)
).toDF("user_id", "email").createOrReplaceTempView("table")

spark.sql(
"SELECT user_id, explode_outer(split(email, ',')) FROM table"
).show
// +-------+----+
// |user_id| col|
// +-------+----+
// | u1| e1|
// | u1| e2|
// | u2|null|
// +-------+----+

lrl1mhuk

lrl1mhuk3#

添加 coalesce 分手后似乎有效果

with tmp_table as ( 
  select 'u1' as user, 'e1,e2' as email 
  union all 
  select 'u2' as user, NULL as email
)
select * FROM tmp_table 
LATERAL VIEW explode ( coalesce(split ( email ,',' ), array(NULL)) ) email AS email_id;

输出

u1  e1,e2   e1
u1  e1,e2   e2
u2  NULL    NULL

相关问题