我正在尝试在openshift/kuberentes/docker(openshift 3.5)上设置一个示例hadoop集群,遇到以下问题:
一次只能在namenode上注册一个datanode,因为namenode可以看到同一ip下的所有datanode(192.168.20.1)。这显然是由于集群中的网络路由
实际样品配置:
名称节点
192.168.20.119 hadoop-namenode-10-qp83z
数据节点
192.168.20.132 hadoop-slave-0.hadoop-slave.my-project.svc.cluster.local hadoop-slave-0
192.168.20.133 hadoop-slave-1.hadoop-slave.my-project.svc.cluster.local hadoop-slave-1
192.168.20.134 hadoop-slave-2.hadoop-slave.my-project.svc.cluster.local hadoop-slave-2
名称节点日志:
17/12/05 22:11:21 INFO net.NetworkTopology: Removing a node: /default-rack/192.168.20.1:50010
17/12/05 22:11:21 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.20.1:50010
17/12/05 22:11:21 INFO blockmanagement.BlockReportLeaseManager: Registered DN f3c22144-f9cf-47dc-b0b7-bf946121ee81 (192.168.20.1:50010).
17/12/05 22:11:21 INFO blockmanagement.DatanodeDescriptor: Adding new storage ID DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c for DN 192.168.20.1:50010
17/12/05 22:11:21 INFO BlockStateChange: BLOCK* processReport 0x9c1289bc1f9f766f: Processing first storage report for DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c from datanode f3c22144-f9cf-47dc-b0b7-bf946121ee81
17/12/05 22:11:21 INFO BlockStateChange: BLOCK* processReport 0x9c1289bc1f9f766f: from storage DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c node DatanodeRegistration(192.168.20.1, datanodeUuid=f3c22144-f9cf-47dc-b0b7-bf946121ee81, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6b84af8f-fe9a-465a-840e-6acb0fe5f8d9;nsid=399770301;c=0), blocks: 0, hasStaleStorage: false, processing time: 0 msecs, invalidatedBlocks: 0
17/12/05 22:11:21 INFO hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(192.168.20.1, datanodeUuid=2bd926b9-b00e-4eb6-858d-3e90fa6b3ef8, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6b84af8f-fe9a-465a-840e-6acb0fe5f8d9;nsid=399770301;c=0) storage 2bd926b9-b00e-4eb6-858d-3e90fa6b3ef8
17/12/05 22:11:21 INFO namenode.NameNode: BLOCK* registerDatanode: 192.168.20.1:50010
配置(hdfs site.xml):
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value> <!-- same result with false -->
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value> <!-- same result with false -->
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
所有吊舱上的ip路由输出:
ip route
default via 192.168.20.1 dev eth0
192.168.0.0/16 dev eth0
192.168.20.0/24 dev eth0 proto kernel scope link src 192.168.20.134
224.0.0.0/4 dev eth0
这个问题与为什么停靠的hadoop数据节点用错误的ip地址注册?中描述的问题惊人地相似,但现在是在kubernetes集群的上下文中
有什么想法吗?
1条答案
按热度按时间bxpogfeg1#
这有用吗?
在缩小datanode statefulset之前,您需要告诉hadoop一个datanode将消失;)
看到了吗http://b4mad.net/datenbrei/openshift/hadoop-hdfs/ 另请参见https://gitlab.com/goern/hdfs-openshift