试着做一些有Spark的工作。。什么时候 df.count()
调用时,我得到以下堆栈跟踪:
正在启动作业。。。
Starting job: count at NativeMethodAccessorImpl.java:0
Registering RDD 24 (count at NativeMethodAccessorImpl.java:0) as input to shuffle 0
Got job 2 (count at NativeMethodAccessorImpl.java:0) with 1 output partitionsFinal stage: Result
Stage 3 (count at NativeMethodAccessorImpl.java:0)
Submitting ShuffleMapStage 2 (MapPartitionsRDD[24] at count at NativeMethodAccessorImpl.java:0), which has no missing parents
Submitting 5 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[24] at count at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4))
失败后。。。。
the Job 2 failed: count at NativeMethodAccessorImpl.java:0, took 32.116609 s
INFO DAGScheduler: ShuffleMapStage 2 (count at NativeMethodAccessorImpl.java:0) failed in 32.053 s due to Stage cancelled because SparkContext was shut down
看起来行数太多了。我对spark很陌生,有什么办法处理这个问题吗?也许是配置选项?
暂无答案!
目前还没有任何答案,快来回答吧!