配置单元:如何消除重复的子字符串

mqkwyuun  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(257)

配置单元表:

create table T (i int, s string);

  insert into T values
  (1, "a1&b2"),
  (1, "b2&c3"),
  (2, "c1&d2"),
  (2, "c1");

s列包含由分隔的值&
所需的输出应按第1列分组并连接s列,但只有一个唯一的子字符串值(用&)分隔):

i    grouped_s
   --   -------------
   1    a1&b2&c3
   2    c1&d2

以下是我的尝试:

SELECT i,
concat_ws('&',
 collect_set(
  split(concat_ws('&' , collect_set(s)), "&" )
 )
)
as grouped_s
FROM T group by i;

我知道了:

FAILED: SemanticException [Error 10128]: Line 6:24 Not yet supported place for UDAF 'collect_set'

另外,我想使它不使用嵌套的sql。

bfhwhh0e

bfhwhh0e1#

使用侧视图+分解:

select t.i, concat_ws('&',collect_set(e.val)) as grouped_s
  from T t 
       lateral view outer explode(split(t.s,'&')) e as val
 group by t.i;

结果:

t.i grouped_s
1   a1&b2&c3
2   c1&d2

相关问题