我试图在单个小型虚拟机(4gb ram)上设置spark上的配置单元,但无法让它处理查询。
比如这个 SELECT max(price) FROM rentflattoday
当查询挂起在无限循环中时,会产生以下容器日志:
2019-02-24 14:41:35 INFO SignalUtils:54 - Registered signal handler for TERM
2019-02-24 14:41:35 INFO SignalUtils:54 - Registered signal handler for HUP
2019-02-24 14:41:35 INFO SignalUtils:54 - Registered signal handler for INT
2019-02-24 14:41:35 INFO SecurityManager:54 - Changing view acls to: hadoop
2019-02-24 14:41:35 INFO SecurityManager:54 - Changing modify acls to: hadoop
2019-02-24 14:41:35 INFO SecurityManager:54 - Changing view acls groups to:
2019-02-24 14:41:35 INFO SecurityManager:54 - Changing modify acls groups to:
2019-02-24 14:41:35 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
2019-02-24 14:41:36 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-02-24 14:41:37 INFO ApplicationMaster:54 - Preparing Local resources
2019-02-24 14:41:39 INFO ApplicationMaster:54 - ApplicationAttemptId: appattempt_1551033757513_0011_000001
2019-02-24 14:41:39 INFO ApplicationMaster:54 - Starting the user application in a separate Thread
2019-02-24 14:41:39 INFO ApplicationMaster:54 - Waiting for spark context initialization...
2019-02-24 14:41:39 INFO RemoteDriver:125 - Connecting to: weirv1:42832
2019-02-24 14:41:39 INFO HiveConf:187 - Found configuration file file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/28/__spark_conf__.zip/__hadoop_conf__/hive-site.xml
2019-02-24 14:41:40 WARN HiveConf:5214 - HiveConf of name hive.enforce.bucketing does not exist
2019-02-24 14:41:40 WARN Rpc:170 - Invalid log level null, reverting to default.
2019-02-24 14:41:41 INFO SparkContext:54 - Running Spark version 2.4.0
2019-02-24 14:41:41 INFO SparkContext:54 - Submitted application: Hive on Spark (sessionId = 94aded5e-fbeb-4839-af11-9c5f5902fa0c)
2019-02-24 14:41:41 INFO SecurityManager:54 - Changing view acls to: hadoop
2019-02-24 14:41:41 INFO SecurityManager:54 - Changing modify acls to: hadoop
2019-02-24 14:41:41 INFO SecurityManager:54 - Changing view acls groups to:
2019-02-24 14:41:41 INFO SecurityManager:54 - Changing modify acls groups to:
2019-02-24 14:41:41 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
2019-02-24 14:41:41 INFO Utils:54 - Successfully started service 'sparkDriver' on port 37368.
2019-02-24 14:41:41 INFO SparkEnv:54 - Registering MapOutputTracker
2019-02-24 14:41:41 INFO SparkEnv:54 - Registering BlockManagerMaster
2019-02-24 14:41:41 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-02-24 14:41:41 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-02-24 14:41:41 INFO DiskBlockManager:54 - Created local directory at /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1551033757513_0011/blockmgr-ea75eeb2-fb84-4d22-8f29-ba4283eb5efc
2019-02-24 14:41:42 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2019-02-24 14:41:42 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2019-02-24 14:41:42 INFO log:192 - Logging initialized @9697ms
2019-02-24 14:41:43 INFO JettyUtils:54 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
2019-02-24 14:41:43 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2019-02-24 14:41:43 INFO Server:419 - Started @10064ms
2019-02-24 14:41:43 INFO AbstractConnector:278 - Started ServerConnector@5d1faff9{HTTP/1.1,[http/1.1]}{0.0.0.0:33181}
2019-02-24 14:41:43 INFO Utils:54 - Successfully started service 'SparkUI' on port 33181.
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e4dde9a{/jobs,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5b4b2d8b{/jobs/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36f37180{/jobs/job,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@edf8590{/jobs/job/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c7ad6b5{/stages,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2128c9cb{/stages/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4ceefc2f{/stages/stage,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3fb4ee4{/stages/stage/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@38cfc530{/stages/pool,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7eff0f35{/stages/pool/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4f9d6ef6{/storage,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16c8958f{/storage/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@50683423{/storage/rdd,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@56e81fbc{/storage/rdd/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@72262149{/environment,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2010a66f{/environment/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31c84762{/executors,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27cbab18{/executors/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@64a4eac1{/executors/threadDump,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@41221be4{/executors/threadDump/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@32a2a7f5{/static,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@32d23207{/,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3808225f{/api,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@35b9f8ea{/jobs/job/kill,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c552738{/stages/stage/kill,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://weirV1:33181
2019-02-24 14:41:43 INFO YarnClusterScheduler:54 - Created YarnClusterScheduler
2019-02-24 14:41:43 INFO SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1551033757513_0011 and attemptId Some(appattempt_1551033757513_0011_000001)
2019-02-24 14:41:43 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35541.
2019-02-24 14:41:43 INFO NettyBlockTransferService:54 - Server created on weirV1:35541
2019-02-24 14:41:43 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-02-24 14:41:43 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:43 INFO BlockManagerMasterEndpoint:54 - Registering block manager weirV1:35541 with 366.3 MB RAM, BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:43 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:43 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:44 INFO JettyUtils:54 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
2019-02-24 14:41:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5e35b086{/metrics/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:44 INFO EventLoggingListener:54 - Logging events to hdfs:/spark-event-log/application_1551033757513_0011_1
2019-02-24 14:41:45 INFO RMProxy:98 - Connecting to ResourceManager at weirv1/80.211.222.23:8030
2019-02-24 14:41:45 INFO YarnRMClient:54 - Registering the ApplicationMaster
2019-02-24 14:41:45 INFO ApplicationMaster:54 -
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1551033757513_0011
SPARK_USER -> hadoop
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx1024m \
'-Dhive.spark.log.dir=/home/hadoop/spark/logs/' \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.hadoop.hbase.regionserver.info.port=16030' \
'-Dspark.hadoop.hbase.master.info.port=16010' \
'-Dspark.ui.port=0' \
'-Dspark.hadoop.hbase.rest.port=8080' \
'-Dspark.hadoop.hbase.master.port=16000' \
'-Dspark.hadoop.hbase.regionserver.port=16020' \
'-Dspark.driver.port=37368' \
'-Dspark.hadoop.hbase.status.multicast.address.port=16100' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@weirV1:37368 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
4 \
--app-id \
application_1551033757513_0011 \
--user-class-path \
file:$PWD/__app__.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
__app__.jar -> resource { scheme: "hdfs" host: "localhost" port: 9000 file: "/user/hadoop/.sparkStaging/application_1551033757513_0011/hive-exec-3.1.1.jar" } size: 40604738 timestamp: 1551037287119 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "localhost" port: 9000 file: "/spark-jars-nohive" } size: 0 timestamp: 1550932521588 type: ARCHIVE visibility: PUBLIC
__spark_conf__ -> resource { scheme: "hdfs" host: "localhost" port: 9000 file: "/user/hadoop/.sparkStaging/application_1551033757513_0011/__spark_conf__.zip" } size: 623550 timestamp: 1551037288226 type: ARCHIVE visibility: PRIVATE
===============================================================================
2019-02-24 14:41:46 INFO YarnAllocator:54 - Will request 1 executor container(s), each with 4 core(s) and 1194 MB memory (including 170 MB of overhead)
2019-02-24 14:41:46 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@weirV1:37368)
2019-02-24 14:41:46 INFO YarnAllocator:54 - Submitted 1 unlocalized container requests.
2019-02-24 14:41:46 INFO ApplicationMaster:54 - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
2019-02-24 14:42:13 INFO YarnClusterSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
2019-02-24 14:42:13 INFO YarnClusterScheduler:54 - YarnClusterScheduler.postStartHook done
2019-02-24 14:42:13 INFO SparkContext:54 - Added JAR hdfs://localhost:9000/tmp/hive/hadoop/_spark_session_dir/94aded5e-fbeb-4839-af11-9c5f5902fa0c/hive-exec-3.1.1.jar at hdfs://localhost:9000/tmp/hive/hadoop/_spark_session_dir/94aded5e-fbeb-4839-af11-9c5f5902fa0c/hive-exec-3.1.1.jar with timestamp 1551037333719
2019-02-24 14:42:13 INFO RemoteDriver:306 - Received job request befdba6d-70e5-4a3b-a08e-564376ba3b47
2019-02-24 14:42:14 INFO SparkClientUtilities:107 - Copying hdfs://localhost:9000/tmp/hive/hadoop/_spark_session_dir/94aded5e-fbeb-4839-af11-9c5f5902fa0c/hive-exec-3.1.1.jar to /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1551033757513_0011/container_1551033757513_0011_01_000001/tmp/1551037299410-0/hive-exec-3.1.1.jar
2019-02-24 14:42:14 INFO SparkClientUtilities:71 - Added jar[file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1551033757513_0011/container_1551033757513_0011_01_000001/tmp/1551037299410-0/hive-exec-3.1.1.jar] to classpath.
2019-02-24 14:42:16 INFO deprecation:1173 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2019-02-24 14:42:16 INFO Utilities:3298 - Processing alias rentflattoday
2019-02-24 14:42:16 INFO Utilities:3336 - Adding 1 inputs; the first input is hdfs://localhost:9000/user/hive/warehouse/csu.db/rentflattoday
2019-02-24 14:42:16 INFO SerializationUtilities:569 - Serializing MapWork using kryo
2019-02-24 14:42:17 INFO Utilities:633 - Serialized plan (via FILE) - name: Map 1 size: 6.57KB
2019-02-24 14:42:18 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1216.3 KB, free 365.1 MB)
2019-02-24 14:42:19 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 85.2 KB, free 365.0 MB)
2019-02-24 14:42:19 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on weirV1:35541 (size: 85.2 KB, free: 366.2 MB)
2019-02-24 14:42:19 INFO SparkContext:54 - Created broadcast 0 from Map 1
2019-02-24 14:42:19 INFO Utilities:429 - PLAN PATH = hdfs://localhost:9000/tmp/hive/hadoop/75557489-581b-4292-b43b-1c86c6bcdcb2/hive_2019-02-24_14-41-17_480_8986995693652128044-2/-mr-10004/8b6206d1-557f-4345-ace3-9dfe64d6634b/map.xml
2019-02-24 14:42:19 INFO CombineHiveInputFormat:477 - Total number of paths: 1, launching 1 threads to check non-combinable ones.
2019-02-24 14:42:19 INFO CombineHiveInputFormat:413 - CombineHiveInputSplit creating pool for hdfs://localhost:9000/user/hive/warehouse/csu.db/rentflattoday; using filter path hdfs://localhost:9000/user/hive/warehouse/csu.db/rentflattoday
2019-02-24 14:42:20 INFO FileInputFormat:283 - Total input paths to process : 1
2019-02-24 14:42:20 INFO CombineFileInputFormat:413 - DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
2019-02-24 14:42:20 INFO CombineHiveInputFormat:467 - number of splits 1
2019-02-24 14:42:20 INFO CombineHiveInputFormat:587 - Number of all splits 1
2019-02-24 14:42:20 INFO SerializationUtilities:569 - Serializing ReduceWork using kryo
2019-02-24 14:42:20 INFO Utilities:633 - Serialized plan (via FILE) - name: Reducer 2 size: 3.84KB
2019-02-24 14:42:20 INFO SparkPlan:107 -
Spark RDD Graph:
(1) Reducer 2 (1) MapPartitionsRDD[4] at Reducer 2 []
| Reducer 2 (GROUP, 1) MapPartitionsRDD[3] at Reducer 2 []
| ShuffledRDD[2] at Reducer 2 []
+-(1) Map 1 (1) MapPartitionsRDD[1] at Map 1 []
| Map 1 (rentflattoday, 1) HadoopRDD[0] at Map 1 []
2019-02-24 14:42:20 INFO DAGScheduler:54 - Registering RDD 1 (Map 1)
2019-02-24 14:42:20 INFO DAGScheduler:54 - Got job 0 (Reducer 2) with 1 output partitions
2019-02-24 14:42:20 INFO DAGScheduler:54 - Final stage: ResultStage 1 (Reducer 2)
2019-02-24 14:42:20 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0)
2019-02-24 14:42:20 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0)
2019-02-24 14:42:20 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (Map 1 (1) MapPartitionsRDD[1] at Map 1), which has no missing parents
2019-02-24 14:42:21 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 293.7 KB, free 364.7 MB)
2019-02-24 14:42:21 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 88.1 KB, free 364.7 MB)
2019-02-24 14:42:21 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on weirV1:35541 (size: 88.1 KB, free: 366.1 MB)
2019-02-24 14:42:21 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1161
2019-02-24 14:42:21 INFO DAGScheduler:54 - Submitting 1 missing tasks from ShuffleMapStage 0 (Map 1 (1) MapPartitionsRDD[1] at Map 1) (first 15 tasks are for partitions Vector(0))
2019-02-24 14:42:21 INFO YarnClusterScheduler:54 - Adding task set 0.0 with 1 tasks
2019-02-24 14:42:36 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:42:51 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:06 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:21 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:36 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:51 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:06 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:21 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:36 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:51 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:45:06 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:45:21 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:45:36 WARN YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
这是我的hive-site.xml和yarn-site.xml
<configuration>
...
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>yarn</value>
</property>
<property>
<name>spark.submit.deployMode</name>
<value>cluster</value>
</property>
<property>
<name>spark.home</name>
<value>/home/hadoop/spark</value>
</property>
<property>
<name>spark.yarn.archive</name>
<value>hdfs:///spark-jars-nohive</value>
</property>
<property>
<name>spark.queue.name</name>
<value>default</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs:///spark-event-log</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>4</value>
</property>
<property>
<name>spark.executor.instances</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>false</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>1024m</value>
</property>
<property>
<name>spark.executor.memoryOverhead</name>
<value>170m</value>
</property>
</configuration>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>weirv1</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3072</value>
</property>
<property>
<description>The minimum allocation size for every container request at the RM, in MBs. Memory requests lower than this won't take effect,
and the specified value will get allocated at minimum.</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<description>The maximum allocation size for every container request at the RM, in MBs. Memory requests higher than this won't take effect,
and will get capped to this value.</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>3072</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx1638m</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/home/hadoop/hadoop/etc/hadoop/fair-scheduler.xml</value>
</property>
</configuration>
由于我是新手,我假设其中一些设置是错误的/不一致的,或者日志中的警告仅仅意味着我的机器内存不足,我应该更改内存设置吗?
谢谢:-)
1条答案
按热度按时间wgeznvg71#
既然我已经弄明白了,我就把它贴在这里,以防有人无意中发现。看起来,这台机器确实内存不足,将yarn.scheduler.minimum-allocation-mb设置为512,将spark.executor.memory设置为512m很有帮助。