slave-vm从从从机列表中移除,仍由yarn/tez访问

2izufjch  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(316)

所以我从从属vm列表中删除了vm4,当我运行下面的命令时,它不会访问它

hdfs dfsadmin -report

结果是:

ubuntu@anmol-vm1-new:~$ hdfs dfsadmin -report
15/12/14 06:56:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 1268169326592 (1.15 TB)
Present Capacity: 1199270457337 (1.09 TB)
DFS Remaining: 1199213064192 (1.09 TB)
DFS Used: 57393145 (54.73 MB)
DFS Used%: 0.00%
Under replicated blocks: 27
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)

Live datanodes:
Name: 10.0.1.191:50010 (anmol-vm2-new)
Hostname: anmol-vm2-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19005440 (18.13 MB)
Non DFS Used: 21501829120 (20.03 GB)
DFS Remaining: 401202274304 (373.65 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.91%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:12 UTC 2015

Name: 10.0.1.190:50010 (anmol-vm1-new)
Hostname: anmol-vm1-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19369984 (18.47 MB)
Non DFS Used: 25831350272 (24.06 GB)
DFS Remaining: 396872388608 (369.62 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.88%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:13 UTC 2015

Name: 10.0.1.192:50010 (anmol-vm3-new)
Hostname: anmol-vm3-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 19017721 (18.14 MB)
Non DFS Used: 21565689863 (20.08 GB)
DFS Remaining: 401138401280 (373.59 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Mon Dec 14 06:56:11 UTC 2015

但是在某个时刻,yarn试图访问它。这是我收到的日志:

yarn logs -applicationId application_1450050523156_0009

http://pastebin.com/uvhnkrrp

Service org.apache.tez.dag.app.rm.TaskScheduler failed in state STARTED; cause: java.lang.IllegalArgumentException: java.net.UnknownHostException: anmol-vm4-new
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
        at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
        at org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM.createAndGetOptimisticNMToken(NMTokenSecretManagerInRM.java:325)
        at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
        at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
        at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2010)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)
Caused by: java.net.UnknownHostException: anmol-vm4-new
        ... 15 more

你知道为什么要访问vm4吗?它不在从属列表中,如何修复?
更新:我做了以下操作,但仍然收到一个错误,因为它试图访问 vm4 :
1) 添加文件 exclude 以及 mapred.excludeconf yarnpp的目录,包括vm4的私有ip地址。
2) 将此添加到 mapred-site.xml :

<property>
    <name>mapred.hosts.exclude</name>
    <value>/home/hadoop/yarnpp/conf/mapred.exclude</value>
    <description>Names a file that contains the list of hosts that
      should be excluded by the jobtracker.  If the value is empty, no
      hosts are excluded.</description>
  </property>

3) 将此添加到 hdfs-site.xml :

<property>
 <name>dfs.hosts.exclude</name>
 <value>/home/hadoop/yarnpp/conf/exclude</value>
 <final>true</final>
</property>

3.5)将此添加到 yarn-site.xml :

<property>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value>/home/hadoop/yarnpp/conf/exclude</value>
    <description>Path to file with nodes to exclude.</description>
  </property>

4) 运行cp\u host.sh将conf目录复制到所有从属服务器!
5) 运行reboot\u everything脚本 stop-all.sh ,格式化和 start-all.sh )
6) hadoop dfsadmin -refreshNodes 7) 在主vm中运行此命令:

yarn rmadmin -refreshNodes

下面是新的日志:http://pastebin.com/ckpy9gmb
此外,即使vm4不在VM列表中,它仍然显示在这里:

现在当我运行 gridmix-generate.sh 作业我得到这个错误:

15/12/14 10:14:53 INFO ipc.Client: Retrying connect to server: anmol-vm3-new/10.0.1.192:50833. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
7kqas0il

7kqas0il1#

在和莫娜聊天后,问题已经解决了。
当运行stop-all.sh命令时,有时所有进程可能都不会停止。跑步是个好习惯 ps -ef 命令以确保所有节点上的所有进程都已停止。monal已经运行了stop-all.sh命令 ps -ef|grep -i datanode 命令仍在显示结果。
然后在聊天中,我要求她重新启动所有vm,这将清理悬空进程。硬重启已解决问题。

相关问题