读取其他slave的map任务的map输出时出错,运行hadoop cluster的ubuntu的etc/hosts文件的实际内容应该是什么

w3nuxt5m  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(352)

我设置了2个节点(virtaul机器)hadoop集群设置。成功启动dfs和mapred deamons后,我运行hadoop演示示例,程序在这个终端显示后变慢:

Number of Maps = 4 Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job`enter code here`
13/06/10 21:36:43 INFO mapred.FileInputFormat: Total input paths to process : 4
13/06/10 21:36:43 INFO mapred.FileInputFormat: Total input paths to process : 4
13/06/10 21:36:43 INFO mapred.JobClient: Running job: job_201306101254_0005
13/06/10 21:36:44 INFO mapred.JobClient:  map 0% reduce 0%
13/06/10 21:36:49 INFO mapred.JobClient:  map 75% reduce 0%
13/06/10 21:36:50 INFO mapred.JobClient:  map 100% reduce 0%

所以基本上map任务是正确完成的,我看了尝试日志,特别是reduce任务的尝试日志,确认reduce任务无法读取另一个slave生成的mapoutput**,错误如下:


****13/06/11 01:55:45 WARN mapred.JobClient: Error reading task outputhttp://hadoop-desk.localdomain:50060/tasklog?plaintext=true&taskid=attempt_201306110154_0001_m_000000_0&filter=stdout

13/06/11 01:55:45 WARN mapred.JobClient: Error reading task outputhttp://hadoop-desk.localdomain:50060/tasklog?plaintext=true&taskid=attempt_201306110154_0001_m_000000_0&filter=stderr
13/06/11 01:55:49 INFO mapred.JobClient:  map 75% reduce 16%****

因此,生成这个mapoutput的map任务被认为是失败的,并且在不同的slave(运行reduce的slave)上重新调度,使得整个程序变慢。我认为原因是因为ubuntu的etc/hosts:

127.0.0.1   localhost
127.0.1.1   hadoop-desk.localdomain hadoop-desk
192.168.196.128 master
192.168.196.129 slave

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

即使在删除localhost之后,我也会遇到同样的错误,这一行

127.0.0.1   localhost

我去掉了这条线

127.0.1.1   hadoop-desk.localdomain hadoop-desk

然后我得到一个错误:

Number of Maps = 4 Samples per Map = 10000
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.SafeModeException: Cannot delete /user/hadoop-user/test-mini-mr. Name node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
    at org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem.java:1494)
    at org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1466)
    at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:425)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

    at org.apache.hadoop.ipc.Client.call(Client.java:715)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
    at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:529)
    at org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:192)
    at org.apache.hadoop.examples.PiEstimator.launch(PiEstimator.java:188)
    at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:245)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:252)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
piok6c0g

piok6c0g1#

ubuntu8(不知道上面的版本)在etc/hosts文件中有这个条目

127.0.1.1   yourhostname.localdomain yourhostname

此条目产生了问题,因此请注解该行,这是我的“etc/hosts”文件:

127.0.0.1   localhost

# 127.0.1.1  hadoop-desk.localdomain hadoop-desk

192.168.196.128 master
192.168.196.129 slave

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

在此之后,将出现以下错误:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.SafeModeException: <Cannot delete /user/hadoop-user/test-mini-mr>. Name node is in safe mode.

要避免此错误,请将hdfs-default.xml中的dfs.safemode.threshold.pct参数从0.999f更改为0.0f

<property>
  <name>dfs.safemode.threshold.pct</name>
  <value>0.00f</value>
  <description>
    Specifies the percentage of blocks that should satisfy 
    the minimal replication requirement defined by dfs.replication.min.
    Values less than or equal to 0 mean not to wait for any particular
    percentage of blocks before exiting safemode.
    Values greater than 1 will make safe mode permanent.
  </description>
</property>

现在,我仍然得到“错误读取mapoutput”(mapoutput的不同奴隶)。我的主机名和从主机名相同,所以我通过编辑它们各自的“etc/hostname”文件,将它们的主机名分别更改为“master”和“slave”。因为我在etc/hosts文件中已经有了“master”和“slave”的条目,所以我没有得到任何错误。

相关问题