我使用wholetextfilerdd()加载两个文件,总共900mb。我的电脑内存是8gb,我的群集内存是1tb,但是发生了与下面日志相同的错误。
我的问题是:
1.如何在本地模式下正确运行(包括jvm选项)?
2.wholetextfilerdd()和textfile()在jvm堆空间方面有区别吗?
20/11/30 16:02:07 INFO WholeTextFileRDD: Input split: Paths:/Users/l/PycharmProjects/lrhs/lhs.csv:0+361008379
20/11/30 16:02:07 INFO WholeTextFileRDD: Input split: Paths:/Users/l/PycharmProjects/lrhs/rhs.csv:0+501024316
20/11/30 16:02:11 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:335)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:795)
at org.apache.hadoop.io.Text.decode(Text.java:412)
at org.apache.hadoop.io.Text.decode(Text.java:389)
at org.apache.hadoop.io.Text.toString(Text.java:280)
at org.apache.spark.SparkContext.$anonfun$wholeTextFiles$2(SparkContext.scala:943)
at org.apache.spark.SparkContext$$Lambda$1058/1541599386.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.executor.Executor$TaskRunner$$Lambda$1016/751565924.apply(Unknown Source)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
暂无答案!
目前还没有任何答案,快来回答吧!