spark任务在向hdfs写入数据时会无限期地陷入困境

2admgd59  于 2021-07-12  发布在  Spark
关注(0)|答案(0)|浏览(288)

我在spark应用程序中使用hadoop的javaapi将本地目录移动到hdfs中的某个位置。但有时,很少有任务会因为等待某个锁而陷入困境。从spark用户界面,我可以得到跟踪,但无法理解它。你能帮忙吗?很高兴提供更多信息。
Spark:2.4.0 hdfs:2.7.1.2.4.0.0-169java:1.8.0_60

java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:502)
org.apache.hadoop.hdfs.DFSOutputStream.waitAndQueueCurrentPacket(DFSOutputStream.java:1755)
org.apache.hadoop.hdfs.DFSOutputStream.writeChunkImpl(DFSOutputStream.java:1839) => holding Monitor(org.apache.hadoop.hdfs.DFSOutputStream@631394790})
org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1788) => holding Monitor(org.apache.hadoop.hdfs.DFSOutputStream@631394790})
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206)
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:124)
org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:110) => holding Monitor(org.apache.hadoop.hdfs.DFSOutputStream@631394790})
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
java.io.DataOutputStream.write(DataOutputStream.java:107) => holding Monitor(org.apache.hadoop.hdfs.client.HdfsDataOutputStream@1598458848})
org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:356)
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:356)
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1965)
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1933)
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1898)

org.apache.spark.sql.Dataset$$anonfun$foreachPartition$2.apply(Dataset.scala:2747)
org.apache.spark.sql.Dataset$$anonfun$foreachPartition$2.apply(Dataset.scala:2747)
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
org.apache.spark.scheduler.Task.run(Task.scala:121)
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题