我在apache pig中运行foreach操作时遇到一个outofmemory错误。
16/06/24 15:14:17 INFO util.SpillableMemoryManager: first memory
handler call- Usage threshold init = 164102144(160256K) used =
556137816(543103K) committed = 698875904(682496K) max =
698875904(682496K)
java.lang.OutOfMemoryError: Java heap space
-XX:OnOutOfMemoryError="kill -9 %p"
Executing /bin/sh -c "kill -9 4095"... Killed
我的Pig剧本:
A = LOAD 'PageCountTest/' USING PigStorage(' ') AS (Project:chararray,
Title:chararray, count:int , size:int);
B = GROUP A BY (Project,Title);
C = FOREACH B
generate group, SUM(A.count) AS COUNT; D = ORDER C BY COUNT DESC;
STORE C INTO '/user/hadoop/wikistats';
样本数据:
aa.b Main_Page 1 14335
aa.d India 1 4075
aa.d Main_Page 1 13190
aa.d Special:RecentChanges 1 200
aa.d Talk:Main_Page/ 1 14147
aa.d w/w/index.php 9 137502
aa Main_Page 6 9872
aa Special:Statistics 1 324
有人能帮忙吗?
1条答案
按热度按时间3pvhb19x1#
我确实怀疑当你订购时会出现内存问题,因为它是最重的。最简单的方法是在启动pig作业时使用堆大小参数。
也可以直接在脚本中声明堆大小。