hadoop作业挂起在accepted上,并显示resourcemanager日志java.net.unknownhostexception

xytpbqjk  于 2021-06-02  发布在  Hadoop
关注(0)|答案(0)|浏览(322)

正如标题中所描述的,我在一个内部网络上部署了一个hadoopv2.6.3集群,使用静态ip,比如10.0.0.x。然后我运行了一个wordcount示例程序,但是shell只给出输出和挂起:

hadoop jar wc.jar WordCount /user/alex/data/kaggle.sample /user/alex/wc/output  
16/04/06 10:44:29 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.0.7:8032
16/04/06 10:44:29 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/04/06 10:44:30 INFO input.FileInputFormat: Total input paths to process : 1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: number of splits:1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459942813464_0002
16/04/06 10:44:30 INFO impl.YarnClientImpl: Submitted application application_1459942813464_0002
16/04/06 10:44:30 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1459942813464_0002/
16/04/06 10:44:30 INFO mapreduce.Job: Running job: job_1459942813464_0002

然后我转到hadoop集群webui,发现作业状态为accepted,并且没有运行。我检查了yarn.resourcemanager的日志文件,最后一条错误消息如下:

2016-04-06 10:34:42,466 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_1459942813464_0001_02_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: worker14.alex
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:256)
at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:269)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:896)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:937)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:930)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:842)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:823)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: worker14.alex
... 19 more

hadoop配置文件如下:


# core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020/</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/</value>
    </property>
</configuration>

# yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/alex/hadoop-2.6.3/tmp/nm.local</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/home/alex/hadoop-2.6.3/log/nm.log</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

# mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>10.0.0.7:10020</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/staging</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/mr-history/tmp</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/mr-history/done</value>
    </property>
</configuration>
``` `/etc/hosts` 文件将IPMap到master或worker1-worker14 `slaves` 文件为master,worker1-worker14
我的主机名好像出了问题。它是 `worker14.alex` 而不是 `worker14` ( `alex` 是我的linux用户名)
我的配置有什么问题?我需要重新启动所有服务器吗?或者我只需要重新启动一些服务,比如 `service networking restart` ?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题