需要将块复制设置为1时

0lvr5msh  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(420)

我们在spark日志中得到以下信息:

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\
The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs

.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036)

我的ambari集群只有3个worker机器,每个worker只有一个数据磁盘。
我在google上搜索了一下,发现解决方案可以是块复制。hdfs中的块复制默认配置为3,我发现建议将“块复制”设置为1而不是3。
问:有意义吗?
此外,我的工作机只有一个数据盘这一事实是否也是问题的一部分?
block replication=文件系统中的文件总数将是在dfs.replication factor设置dfs.replication=1中指定的,表示文件系统中只有一个文件副本。
完整日志:

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[34.2.31.31:50010,DS-8234bb39-0fd4-49be-98ba-32080bc24fa9,DISK], DatanodeInfoWithStorage[34.2.31.33:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]], original=[DatanodeInfoWithStorage[34.2.31.31:50010,DS-8234bb39-0fd4-49be-98ba-32080bc24fa9,DISK], DatanodeInfoWithStorage[34.2.31.33:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1110)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1268)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)
---2018-01-30T15:15:15.015 INFO  [][][] [dal.locations.LocationsDataFramesHandler]
2sbarzqh

2sbarzqh1#

我也面临同样的问题。默认的块复制是3。因此,除非您另行指定,否则创建的所有文件的复制因子都是3。
如果无法访问任何datanode(网络问题或磁盘空间不足),则复制将失败。
使用以下方法检查datanode状态:

hdfs dfsadmin -report

在我的例子中,我有两个开发节点,一个是主节点,一个是数据节点。所以,我把复制因子改为1。
您可以首先从hdfs cli进行测试,如下所示:

echo "test file line1" > copy1
    echo "test file line2" >  copy2
    hdfs dfs -Ddfs.replication=1 -touchz /tmp/appendtest.txt
    hdfs dfs -appendToFile copy1 /tmp/appendtest.txt
    hdfs dfs -appendToFile copy2 /tmp/appendtest.txt

如果在 touchz 命令时,如果不指定复制因子,则在尝试附加localfile时会出现相同的错误 copy2 以下hdfsconfig对象的配置为我解决了问题:

hdfsConfiguration.set("fs.defaultFS", configuration.getString("hdfs.uri"))
  hdfsConfiguration.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
  hdfsConfiguration.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
  hdfsConfiguration.set("dfs.support.append", "true")
  hdfsConfiguration.set("dfs.replication", "1")

相关问题