配置单元中的阵列操作(添加阵列)

sh7euo9m  于 2021-06-26  发布在  Hive
关注(0)|答案(3)|浏览(373)

我有一个配置单元表,其列id(string)、val(string)为:

id,val
abc,{0|1|0}
abc,{0|1|1}
abc,{1|0|1|1}

我想添加按id列分组的val列。预期结果是:

id,val
abc,{1|2|2|1}

这个结果可以通过并行添加数组得到。
我已经尝试过使用横向视图爆炸,然后作为int等铸造。但未能达到预期效果。我知道使用自定义项也是一种选择,但是否有任何其他方式在Hive只。
任何建议都会有帮助。
谢谢

uhry853o

uhry853o1#

从更换 { , } 用空格, split 字符串和使用 lateral viewposexplode 把同一位置的数字加起来。

select id,pos,sum(split_val) as total
from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) tbl as pos,split_val
group by id,pos

然后使用 collect_list 生成最终数组。

select id,collect_list(total)
from (select id,pos,sum(split_val) as total
      from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) tbl as pos,split_val
      group by id,pos
     ) t
group by id
9ceoxa92

9ceoxa922#

配置单元表mytab:

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {0|1|0}    |
|   abc    | {0|1|1}    |
|   abc    | {1|0|1|1}  |
+----------+------------+

预期产量:

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {1|2|2|1}  |
+----------+------------+

使用的配置单元查询:

select id,concat('{',concat_ws('|',(collect_list(cast(cast(expl_val_sum as int)as string)))),'}') as coll_expl_val 
from(
select id,index,sum(expl_val) as expl_val_sum
from mytab 
lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) exp as index,expl_val
group by id,index)a
group by id;
1.First posexplode is used which explodes the array[String].
2.Then based on the index column the array values are added up parallelly.
3.Then cast as int is used to convert from decimal values to integer.
4.Then cast as String and then again converted to array[string] using collect_list.
5.Next the values of array are '|' delimited using concat_ws function.
6.Next concat function is used to append '{' and '}'.

谢谢你的回复。

rbpvctlc

rbpvctlc3#

这是一种可能的方法,可能有更好的方法

select * from tbl1;

+----------+------------+--+
| tbl1.id  |  tbl1.val  |
+----------+------------+--+
| abc      | {0|1|0}    |
| abc      | {0|1|1}    |
| abc      | {1|0|1|1}  |
+----------+------------+--+

把它写在没有 {} ```
insert overwrite directory '/user/cloudera/tbl2'
row format delimited fields terminated by ','
select id, substr(val,2,length(val)-2) as val2 from tbl1

创建一个表来使用它

create external table tbl3(id string, val array)
row format delimited
fields terminated by ','
collection items terminated by '|'
location '/user/cloudera/tbl2'

+----------+------------+--+
| tbl3.id | tbl3.val |
+----------+------------+--+
| abc | [0,1,0] |
| abc | [0,1,1] |
| abc | [1,0,1,1] |
+----------+------------+--+

使用 `posexplode` ```
select id, collect_list(val) 
from (
  select id, sum(c) as val 
    from (
      select id, i, c from tbl3 
      lateral view posexplode(val) v1 as i, c 
    ) tbl 
  group by id, i
  ) tbl2 
group by id

结果呢

+------+------------+--+
|  id  |    _c1     |
+------+------------+--+
| abc  | [1,2,2,1]  |
+------+------------+--+

相关问题