我正在运行一个有24台服务器的hadoop集群。它已经运行了几个月,但是在最后一次重新启动之后,datanodes会因为以下错误而停止运行:
2016-02-05 11:35:56,615 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40786, bytes: 118143861, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000330_0_-1595784897_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076219758_2486790, duration: 21719288540
2016-02-05 11:35:56,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40784, bytes: 118297616, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000231_0_-1089799971_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076221376_2488408, duration: 22149605332
2016-02-05 11:35:56,837 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40780, bytes: 118345914, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000208_0_-2005378882_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076231364_2498422, duration: 22460210591
2016-02-05 11:35:57,359 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40781, bytes: 118419792, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000184_0_406014429_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076221071_2488103, duration: 22978732747
2016-02-05 11:35:58,008 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40787, bytes: 118151696, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000324_0_-608122320_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076222362_2489394, duration: 23063230631
2016-02-05 11:36:00,295 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40776, bytes: 123206293, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000015_0_-846180274_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076244668_2511731, duration: 26044953281
2016-02-05 11:36:00,407 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40764, bytes: 123310419, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000010_0_-310980548_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076244751_2511814, duration: 26288883806
2016-02-05 11:36:01,371 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.133:50010, dest: /192.168.0.133:40783, bytes: 119653309, op: HDFS_READ, cliID: DFSClient_attempt_1454667838939_0001_m_000055_0_-558109635_1, offset: 0, srvID: 6522904d-0698-4794-af45-613a0492753c, blockid: BP-2025286576-192.168.0.93-1414492170010:blk_1076222182_2489214, duration: 26808381782
2016-02-05 11:36:05,224 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2016-02-05 11:36:05,230 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at computer75/192.168.0.133
************************************************************/
每次我重新启动集群时,它都会启动良好,所有节点都处于打开状态。但在运行map reduce作业几秒钟后,一些节点会因该错误而死亡。每次死节点都不一样。
你知道发生了什么事吗?我使用的是hadoop2.4.1,正如我所说的,这个集群已经运行了好几个月没有问题了。
谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!