将数据节点添加到Hadoop集群

kognpnkq  于 2022-11-01  发布在  Hadoop
关注(0)|答案(6)|浏览(225)

当我使用start-all.sh启动hadoopnode 1时,它成功地启动了master和slave上的服务(请参见slave的jps命令输出)。但是当我尝试在管理屏幕中查看活动节点时,slave节点没有显示。即使我从master运行hadoop fs -ls /命令,它也能完美运行,但从slave运行时,它显示错误消息

@hadoopnode2:~/hadoop-0.20.2/conf$ hadoop fs -ls /
12/05/28 01:14:20 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 0 time(s).
12/05/28 01:14:21 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 1 time(s).
12/05/28 01:14:22 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 2 time(s).
12/05/28 01:14:23 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 3 time(s).
.
.
.
12/05/28 01:14:29 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 10 time(s).

从属节点(hadoopnode 2)似乎无法找到/连接到主节点(hadoopnode 1)
请告诉我我遗漏了什么?
这是主节点和从节点的设置- P.S. -主节点和从节点运行相同版本的Linux和Hadoop,SSH工作正常,因为我可以从主节点启动从节点
此外,主节点(hadoopnode 1)和从属节点(hadoopnode 2)上的core-site.xml、hdfs-site.xml和mapred-site.xml的设置也相同
操作系统- Ubuntu 10 Hadoop版本-

oop@hadoopnode1:~/hadoop-0.20.2/conf$ hadoop version
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010

--主服务器(hadoopnode 1)

hadoop@hadoopnode1:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode1 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux

hadoop@hadoopnode1:~/hadoop-0.20.2/conf$ jps
9923 Jps
7555 NameNode
8133 TaskTracker
7897 SecondaryNameNode
7728 DataNode
7971 JobTracker

masters -> hadoopnode1
slaves -> hadoopnode1
hadoopnode2

--从属节点(hadoopnode 2)

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode2 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ jps
1959 DataNode
2631 Jps
2108 TaskTracker

masters - hadoopnode1

core-site.xml
hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/var/tmp/hadoop/hadoop-${user.name}</value>
                <description>A base for other temp directories</description>
        </property>

        <property>
                <name>fs.default.name</name>
                <value>hdfs://hadoopnode1:8020</value>
                <description>The name of the default file system</description>
        </property>

</configuration>

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>hadoopnode1:8021</value>
                <description>The host and port that the MapReduce job tracker runs at.If "local", then jobs are run in process as a single map</description>
        </property>
</configuration>

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
                <description>Default block replication</description>
        </property>
</configuration>
dxxyhpgq

dxxyhpgq1#

通过sudo jps检查您的服务主服务器不应该显示您需要做什么

Restart Hadoop
Go to /app/hadoop/tmp/dfs/name/current
Open VERSION (i.e. by vim VERSION)
Record namespaceID
Go to /app/hadoop/tmp/dfs/data/current
Open VERSION (i.e. by vim VERSION)
Replace the namespaceID with the namespaceID you recorded in step 4.

这个应该行祝你好运

5tmbdcev

5tmbdcev2#

在Web GUI中,您可以看到集群具有的节点数。如果看到的节点数少于预期值,请确保主节点上的/etc/hosts文件仅具有作为主机的节点数(对于2节点集群)。

192.168.0.1 master
192.168.0.2 slave

如果您看到任何127.0.1.1.... ip,请注解掉,因为Hadoop会首先将它们视为主机。

azpvetkf

azpvetkf3#

请检查namenode和datanode日志。(应该在$HADOOP_HOME/logs/中)。最可能的问题是namenode和datanode ID不匹配。请从所有节点中删除hadoop.tmp.dir并再次格式化namenode($HADOOP_HOME/bin/hadoop namenode -format),然后重试。

wwtsj6pe

wwtsj6pe4#

我认为在slave 2中,Slave 2应该监听同一端口8020,而不是监听8021。

qij5mzcb

qij5mzcb5#

将新节点主机名添加到slaves文件,并在新节点上启动数据节点和任务跟踪器。

qhhrdooz

qhhrdooz6#

在你的案件中,的确有两个错误。

can't connect to hadoop master node from slave

那是网络问题。测试一下:192.168.1.120:8020的曲线。
正常响应:curl:(52)来自服务器的空回复
在我的情况下,我得到主机未找到错误。所以只要看看防火墙设置

data node down:

这是Hadoop的问题。Raze 2dust的方法很好。如果你在日志中看到不兼容的namespaceIDs错误,这里有另一种方法:
停止hadoop并编辑/current/VERSION中namespaceID的值,使其与当前namenode的值匹配,然后启动hadoop。
您始终可以使用以下方法检查可用的数据节点:hadoop fsck /

相关问题