尝试将Parquet数据写入s3或hdfs会产生相同的错误error:i already 提及的df.overwrite
resFinal.write.mode(SaveMode.Overwrite).partitionBy("pro".......
本期使用的spark submit是:
spark-submit --master yarn --deploy-mode cluster --executor-memory 50G --driver-memory 54G --executor-cores 5 --queue High --conf spark.yarn.maxAppAttempts=1 --conf spark.driver.maxResultSize=7g --conf spark.executor.memoryOverhead=4500 --conf spark.driver.memoryOverhead=5400 --conf spark.sql.shuffle.partitions=7000 --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.spill.compress=true --conf spark.sql.tungsten.enabled=true --conf spark.sql.autoBroadCastJoinThreshold=-1 --conf spark.speculation=true --conf spark.dynamicAllocation.minExecutors=200 --conf spark.dynamicAllocation.maxExecutors=500 --conf spark.memory.storageFraction=0.6 --conf spark.memory.fraction=0.7 --class com.mnb.history_cleanup s3://dv-cam/1/cleanup-1.0-SNAPSHOT.jar H 0 20170101-20170102 HSO
不管我是写hdfs还是s3,我都明白了
org.apache.hadoop.fs.FileAlreadyExistsException: Path already exists as a file: s3://dv-ms-east-1/ms414x-test1/dl/ry8/.spark-staging-28e84dbb-7e91-4d5c-87ba-8e880cf28904/
或
File does not exist: /user/m/dl/vi/.spark-staging-bdb317f3-7ff9-458e-9ea8-7fb70ce4/pro
暂无答案!
目前还没有任何答案,快来回答吧!