我在cloudera中有一个hadoop集群,有4个节点,1个主节点和3个从节点,复制因子为3,几天之内我的集群不会无缘无故地停止变大,我不执行任何作业,设备上剩余的空间在几分钟内变小,然后我删除一些文件并更改一些内容,我的hadoop主节点和数据节点上都有日志。
日志文件的一部分。
hadoop主节点
2015-07-17 09:30:49,637 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=listCachePools src=null dst=null perm=null proto=rpc
2015-07-17 09:30:49,649 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=create src=/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2015_07_17-09_30_49 dst=null perm=hdfs:supergroup:rw-rw-rw- proto=rpc
2015-07-17 09:30:49,684 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=open src=/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2015_07_17-09_30_49 dst=null perm=null proto=rpc
2015-07-17 09:30:49,699 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=delete src=/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2015_07_17-09_30_49 dst=null perm=null proto=rpc
hadoop数据节点
2015-07-17 09:30:49,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-634864778-172.20.1.45-1399358938139:blk_1074658739_919097 src: /172.20.1.48:59941 dest: /172.20.1.46:50010
2015-07-17 09:30:49,669 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.1.48:59941, dest: /172.20.1.46:50010, bytes: 56, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-824197314_132, offset: 0, srvID: aa5e5f0e-4198-4df5-8dfa-6e7c57e6307d, blockid: BP-634864778-172.20.1.45-1399358938139:blk_1074658739_919097, duration: 4771606
2015-07-17 09:30:49,669 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-634864778-172.20.1.45-1399358938139:blk_1074658739_919097, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-07-17 09:30:51,406 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1074658739_919097 file /dfs/dn/current/BP-634864778-172.20.1.45-1399358938139/current/finalized/subdir13/subdir253/blk_1074658739 for deletion
2015-07-17 09:30:51,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-634864778-172.20.1.45-1399358938139 blk_1074658739_919097 file /dfs/dn/current/BP-634864778-172.20.1.45-1399358938139/current/finalized/subdir13/subdir253/blk_1074658739
pl.FsDatasetAsyncDiskService: Deleted BP-634864778-172.20.1.45-1399358938139 blk_1074658740_919098 file /dfs/dn/current/BP-634864778-172.20.1.45-1399358938139/current/finalized/subdir13/subdir253/blk_1074658740
2015-07-17 09:32:54,684 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-634864778-172.20.1.45-1399358938139:blk_1074658741_919099 src: /172.20.1.48:33789 dest: /172.20.1.47:50010
2015-07-17 09:32:54,725 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.1.48:33789, dest: /172.20.1.47:50010, bytes: 56, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_705538126_132, offset: 0, srvID: bff71ff1-db18-438a-b2ba-4731fa36d44e, blockid: BP-634864778-172.20.1.45-1399358938139:blk_1074658741_919099, duration: 39309294
2015-07-17 09:32:54,725 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-634864778-172.20.1.45-1399358938139:blk_1074658741_919099, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-07-17 09:32:55,909 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2015-07-17 09:32:55,911 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
此时,我的所有群集服务都已停止。
你知道会发生什么吗?任何帮助都将不胜感激
1条答案
按热度按时间pexxcrt21#
我在运行clouderamanager5.4和cdh5.4的prod集群中添加了一些datanode。每个节点配置如下:
12个磁盘分别安装在diff文件系统和
/var
以及/tmp
以及不同磁盘上的操作系统。我一添加datanodes,每个卷就会立即充满46.9gb的数据(几乎占每个磁盘容量的5%)。这是在运行再平衡之前。