有一个有9个字段的模式,我只想取两个字段(6,7,即$5,$6),我想计算$5的平均值,我想按升序排序$6,所以如何做这个任务可以有人帮我。
输入数据:
N368SW 188 170 175 17 -1 MCO MHT 1142
N360SW 100 115 87 -10 5 MCO MSY 550
N626SW 114 115 90 13 14 MCO MSY 550
N252WN 107 115 84 -10 -2 MCO MSY 550
N355SW 104 115 85 -1 10 MCO MSY 550
N405WN 113 110 96 14 11 MCO ORF 655
N456WN 110 110 92 24 24 MCO ORF 655
N743SW 144 155 124 7 18 MCO PHL 861
N276WN 142 150 129 -2 6 MCO PHL 861
N369SW 153 145 134 30 22 MCO PHL 861
N363SW 151 145 137 5 -1 MCO PHL 861
N346SW 141 150 128 51 60 MCO PHL 861
N785SW 131 145 118 -15 -1 MCO PHL 861
N635SW 144 155 127 -6 5 MCO PHL 861
N242WN 298 300 276 68 70 MCO PHX 1848
N439WN 130 140 111 -4 6 MCO PIT 834
N348SW 140 135 124 7 2 MCO PIT 834
N672SW 136 135 122 9 8 MCO PIT 834
N493WN 151 160 136 -9 0 MCO PVD 1073
N380SW 170 155 155 13 -2 MCO PVD 1073
N705SW 164 160 147 6 2 MCO PVD 1073
N233LV 157 160 143 1 4 MCO PVD 1073
N786SW 156 160 139 6 10 MCO PVD 1073
N280WN 160 160 146 1 1 MCO PVD 1073
N282WN 104 95 81 10 1 MCO RDU 534
N694SW 89 100 77 3 14 MCO RDU 534
N266WN 94 95 82 9 10 MCO RDU 534
N218WN 98 100 77 12 14 MCO RDU 534
N355SW 47 50 35 15 18 MCO RSW 133
N388SW 44 45 30 37 38 MCO RSW 133
N786SW 46 50 31 4 8 MCO RSW 133
N707SA 52 50 33 10 8 MCO RSW 133
N795SW 176 185 153 -9 0 MCO SAT 1040
N402WN 176 185 161 4 13 MCO SAT 1040
N690SW 123 130 107 -1 6 MCO SDF 718
N457WN 135 130 105 20 15 MCO SDF 718
N720WN 144 155 131 13 24 MCO STL 880
N775SW 147 160 135 -6 7 MCO STL 880
N291WN 136 155 122 96 115 MCO STL 880
N247WN 144 155 127 43 54 MCO STL 880
N748SW 179 185 159 -4 2 MDW ABQ 1121
N709SW 176 190 158 21 35 MDW ABQ 1121
N325SW 110 105 97 36 31 MDW ALB 717
N305SW 116 110 90 107 101 MDW ALB 717
N403WN 145 165 128 -6 14 MDW AUS 972
N767SW 136 165 125 59 88 MDW AUS 972
N730SW 118 120 100 28 30 MDW BDL 777
我已经编写了这样的代码,但它不能正常工作:
a = load '/path/to/file' using PigStorage('\t');
b = foreach a generate (int)$5 as field_a:int,(chararray)$6 as field_b:chararray;
c = group b all;
d = foreach c generate b.field_b,AVG(b.field_a);
e = order d by field_b ASC;
dump e;
我在订购时遇到错误:
grunt> a = load '/user/horton/sample_pig_data.txt' using PigStorage('\t');
grunt> b = foreach a generate (int)$5 as fielda:int,(chararray)$6 as fieldb:chararray;
grunt> describe @;
b: {fielda: int,fieldb: chararray}
grunt> c = group b all;
grunt> describe @;
c: {group: chararray,b: {(fielda: int,fieldb: chararray)}}
grunt> d = foreach c generate b.fieldb,AVG(b.fielda);
grunt> e = order d by fieldb ;
2017-01-05 15:51:29,623 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 6, column 15> Invalid field projection. Projected field [fieldb] does not exist in schema: :bag{:tuple(fieldb:chararray)},:double.
Details at logfile: /root/pig_1483631021021.log
我想要像这样的输出(与输入数据无关):
(({(Bharathi),(Komal),(Archana),(Trupthi),(Preethi),(Rajesh),(siddarth),(Rajiv) },
{ (72) , (83) , (87) , (75) , (93) , (90) , (78) , (89) }),83.375)
1条答案
按热度按时间mnemlml81#
如果你已经找到了答案,最好的做法是把它贴出来,以便其他人能够更好地理解。