使用mysql(不支持unpivot/pivot)对一个大表中的多个列执行unpivot的正确方法是什么?
我的平桌:
id A1 A2 A3 B1 B2 B3 C1 D1 X Y Z and many other columns
1 10 20 30 100 200 300 0.1 1000 ...
2 5 6 7 0 0 0 0.25 0
3 0 0 0 0 0 0 0.001 0
我的目标是只保留具有非零值的行:(注意c和d没有value2和value3),并且这些xyz列对于每个id都是相同的
id Name value1 value2 value3 X Y Z
1 A 10 20 30 ...
1 B 100 200 300
1 C 0.1 0 0
1 D 1000 0 0
2 A 5 6 7
2 C 0.25 0 0
3 C 0.001 0 0
我是sql新手,我的做法是一步一步地创建一个临时表,每个步骤:
# 1. select only the columns that I need to unpivot
CREATE TABLE staging_1 as
(select id, A1, A2, A3, B1, B2, B3, C1, C2 from flattable)
----
# 2. create all four records for each id
CREATE TABLE staging_2 as
(select id, 'A' as Name, A1 as Value1, A2 as Value2, A3 as Value3 from staging_1)
union all
(select id, 'B', B1, B2, B3 from flattable)
union all
(select id, 'C', C1, 0, 0 from flattable)
union all
(select id, 'D', D1, 0, 0 from flattable)
----
# 3. remove those column with zero value
CREATE TABLE staging_3 as
(select id, Name, Value1, Value2, Value3
from staging_2
where (Name in ('A','B') and Value1<>0 and Value2<>0 and Value3<>0)
or (Name in ('C','D') and Value1<>0) #only need to check Value1 for C, D
----
# 4. join it back with the original data with id
select (all columns but A1,A2,A3...C2 and name, value1,2,3) from flattable
left join staging_3
on flattable.id = staging_3.id
但是,最后一步花费的时间太长,因为我有大约4mid和许多xyz列。这是执行这种数据操作的正确方法吗?
附加问题:我的最终目标是使用hadoop执行这个任务,但是我对hadoop完全陌生,我希望使用hive运行这个sql脚本(我只知道sql)。这有道理吗?或者我应该直接用hadoop语言编写我的脚本,比如pig/hive之类的(我正在使用hortonworks(沙盒)
暂无答案!
目前还没有任何答案,快来回答吧!