我在4个虚拟机上安装了hadoop3.1.0和spark2.4.7。我总共有32核,128g内存。我一直在做Spark壳测试
[hadoop@hadoop1 bin]$hadoop fs -mkdir -p /user/hadoop/testdata
[hadoop@hadoop1 bin]$hadoop fs -put /app/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml /user/hadoop/testdata
[hadoop@hadoop1 bin]$ spark-shell --master spark://hadoop1:7077
scala>val rdd=sc.textFile("hdfs://hadoop1:9000/user/hadoop/testdata/core-site.xml")
scala>rdd.cache()
scala>val wordcount=rdd.flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_)
scala>wordcount.take(10)
scala>val wordsort=wordcount.map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1))
scala>wordsort.take(10)
我一直在玩以下参数
spark.core.connection.ack.wait.timeout 600s
spark.default.parallelism 4
spark.driver.memory 6g
spark.executor.memory 6g
spark.cores.max 21
spark.executor.cores 3
撞到了 org.apache.spark.shuffle.FetchFailedException Failed to connect 192.168.0.XXX
或者 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
是否有一个通用指南来微调这些参数和任何其他参数?
暂无答案!
目前还没有任何答案,快来回答吧!