在pig中操作数据结构

wi3ka0sx  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(366)

之前我问过如何在hive或pig中操作数据结构。我能够用sql得到答案,并从中找到了hive的答案。我还在寻找解决Pig的办法。
我想更改我的表:

进入mytable2:

我试过:

myTable2 = FOREACH myTable GENERATE item, year, 
'jan' AS month, jan AS value, 
'feb' AS month, feb AS value,  
'mar' AS month, mar AS value;

这或多或少是 hive 里的工作原理,但Pig给了我:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108:
<line 2, column 35> Duplicate schema alias: month
siotufzp

siotufzp1#

Pig脚本:

data = LOAD '/pigsamples/sampledata'  USING PigStorage(',') 
       AS (item:CHARARRAY, year:INT, jan:DOUBLE, feb:DOUBLE, mar:DOUBLE);

--concatenating month name to its value so that they won't get separated when i perform a flatten on the tuple.
concat_data =  FOREACH data GENERATE item, year, CONCAT('jan:', (CHARARRAY)jan) AS jan, 
               CONCAT('feb:', (CHARARRAY)feb) AS feb, CONCAT('mar:', (CHARARRAY)mar) AS mar;

--convert the month (name,value) pairs to a bag and flatten them
flatten_values = FOREACH concat_data GENERATE item, year, FLATTEN (TOBAG (jan, feb, mar)) AS month_values;

--split the string based on the delimiter that we used above to concat
split_flatten_values = FOREACH flatten_values GENERATE item, year, FLATTEN (STRSPLIT (month_values, ':')) AS (month:CHARARRAY, value:CHARARRAY);
mcdcgff0

mcdcgff02#

我想明白了,尽管我很想看到一个更简洁的版本:

JAN = FOREACH myTable GENERATE item, year, 'jan' AS month, jan AS value;
FEB = FOREACH myTable GENERATE item, year, 'feb' AS month, feb AS value;
MAR = FOREACH myTable GENERATE item, year, 'mar' AS month, mar AS value;
myTable2 = union JAN, FEB, MAR;

相关问题