EMP_ID PRD_NO PRD_DATE PRD_TOTAL PRD_NORM
IND235 00020 28/Mar/2015 02:00:50 11 60.00
IND235 00018 27/Mar/2015 03:10:40 7 60.00
IND235 00019 28/Mar/2015 04:00:54 3 60.00
IND235 00020 27/Mar/2015 05:00:51 11 60.00
PUR266 00044 28/Mar/2015 01:20:50 85 100.00
PUR266 00024 28/Mar/2015 06:30:60 33 100.00
PUR266 00017 27/Mar/2015 05:30:05 11 100.00
PUR266 00038 27/Mar/2015 02:30:15 60 100.00
I would expect to get the output:
IND235,27/Mar/2015,60,18,42
IND235,28/Mar/2015,60,14,46
PUR266,27/Mar/2015,100,71,29
PUR266,28/Mar/2015,100,118,-18
last col is PRD_NORM-PRD_TOTAL:
PRD_TOTAL sum by PRD_DATE,GROUP BY EMP_ID
我刚刚开始学习pig拉丁语的来龙去脉-在pig或某个库中是否已经有了一种内在的方法来实现这一点,或者我应该考虑编写一个udf吗?
1条答案
按热度按时间xriantvc1#
试试看。。
输入文件:
输出: