在mysql中,在一个大表中取消填充多个列(并使用hadoop执行等效的操作?)

w8rqjzmb  于 2021-06-04  发布在  Hadoop
关注(0)|答案(0)|浏览(197)

使用mysql(不支持unpivot/pivot)对一个大表中的多个列执行unpivot的正确方法是什么?
我的平桌:

id  A1  A2  A3  B1  B2  B3  C1   D1   X  Y  Z and many other columns 
1   10  20  30  100 200 300 0.1 1000  ...
2   5   6   7   0   0   0   0.25   0
3   0   0   0   0   0   0   0.001  0

我的目标是只保留具有非零值的行:(注意c和d没有value2和value3),并且这些xyz列对于每个id都是相同的

id  Name    value1  value2  value3  X  Y  Z
1   A         10    20      30      ...
1   B         100   200     300     
1   C         0.1   0       0
1   D        1000   0       0
2   A          5    6       7
2   C        0.25   0       0
3   C       0.001   0       0

我是sql新手,我的做法是一步一步地创建一个临时表,每个步骤:


# 1. select only the columns that I need to unpivot

CREATE TABLE staging_1 as
(select id, A1, A2, A3, B1, B2, B3, C1, C2 from flattable)
----

# 2. create all four records for each id

CREATE TABLE staging_2 as
(select id, 'A' as Name, A1 as Value1, A2 as Value2, A3 as Value3 from staging_1)
union all
(select id, 'B', B1, B2, B3 from flattable)
union all
(select id, 'C', C1, 0, 0 from flattable)
union all
(select id, 'D', D1, 0, 0 from flattable)
----

# 3. remove those column with zero value

CREATE TABLE staging_3 as
(select id, Name, Value1, Value2, Value3
from staging_2
where (Name in ('A','B') and Value1<>0 and Value2<>0 and Value3<>0) 
or (Name in ('C','D') and Value1<>0)    #only need to check Value1 for C, D
----

# 4. join it back with the original data with id

select (all columns but A1,A2,A3...C2 and name, value1,2,3) from flattable
left join staging_3
on flattable.id = staging_3.id

但是,最后一步花费的时间太长,因为我有大约4mid和许多xyz列。这是执行这种数据操作的正确方法吗?
附加问题:我的最终目标是使用hadoop执行这个任务,但是我对hadoop完全陌生,我希望使用hive运行这个sql脚本(我只知道sql)。这有道理吗?或者我应该直接用hadoop语言编写我的脚本,比如pig/hive之类的(我正在使用hortonworks(沙盒)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题