从错误消息中可以明显看出,保存与文件相关的特定块的副本时出现了问题。原因可能是,访问数据节点以保存特定块(块的副本)时出现问题。完整日志请参阅以下内容:
我找到了另一个用户“华山夜郎”—https://stackoverflow.com/users/987275/huasanyelao 也有类似的异常/问题,但用例不同。
现在,我们如何解决这些问题?我知道在所有情况下都没有固定的解决方案。
1我需要立即采取什么步骤来修复此类错误?
2如果有我当时没有监视日志的作业。我需要采取什么方法来解决这些问题。
p、 s:除了修复网络或访问问题外,我还应该遵循哪些其他方法。
错误日志:
* 15/04/10 11:21:13 INFO impl.TimelineClientImpl: Timeline service address: http://your-name-node/ws/v1/timeline/
15/04/10 11:21:14 INFO client.RMProxy: Connecting to ResourceManager at your-name-node/xxx.xx.xxx.xx:0000
15/04/10 11:21:34 WARN hdfs.DFSClient: DataStreamer Exception
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:29)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:512)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1516)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
15/04/10 11:21:40 INFO hdfs.DFSClient: Could not complete /user/xxxxx/.staging/job_11111111111_1212/job.jar retrying...
15/04/10 11:21:46 INFO hdfs.DFSClient: Could not complete /user/xxxxx/.staging/job_11111111111_1212/job.jar retrying...
15/04/10 11:21:59 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/xxxxx/.staging/job_11111111111_1212
Error occured in MapReduce process:
java.io.IOException: Unable to close file because the last block does not have enough number of replicas.
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:54)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1903)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1871)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1836)
at org.apache.hadoop.mapreduce.JobSubmitter.copyJar(JobSubmitter.java:286)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:254)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at com.xxx.xxx.xxxx.driver.GenerateMyFormat.runMyExtract(GenerateMyFormat.java:222)
at com.xxx.xxx.xxxx.driver.GenerateMyFormat.run(GenerateMyFormat.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.xxx.xxx.xxxx.driver.GenerateMyFormat.main(GenerateMyFormat.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)*
1条答案
按热度按时间nwlls2ji1#
我们也有类似的问题。它主要归因于dfs.namenode.handler.count是不够的。在一些小型集群中,增加这个值可能会有所帮助,但这是因为dos问题,namenode无法处理连接数或rpc调用数,而挂起的删除块将变得庞大。验证hdfs审核日志,查看是否发生了任何大规模删除或其他hdfs操作,并与可能发生的作业匹配。停止这些任务将有助于hdfs的恢复。