distcp失败,错误为“设备上没有剩余空间”

mznpcxlj  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(535)

我正在将hdfs snapshot复制到s3 bucket,错误如下:我正在执行的命令是:hadoop distcp/.snapshot/$snapshotname s3a://$accesskey:$secretkey@$bucket/$snapshotname

15/08/20 06:50:07 INFO mapreduce.Job:  map 38% reduce 0%
15/08/20 06:50:08 INFO mapreduce.Job:  map 39% reduce 0%
15/08/20 06:52:15 INFO mapreduce.Job:  map 41% reduce 0%
15/08/20 06:52:37 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000004_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0 to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/tmp/hive/vladmetodiev/6da8eee9-f482-4d07-96dc-87ff77a4efe4/hive_2015-07-23_17-12-21_989_8312247652079703611-121/-ext-10001/000035_0
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
        ... 10 more
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
        ... 11 more

15/08/20 06:53:13 INFO mapreduce.Job: Task Id : attempt_1439998402428_0006_m_000007_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c --> s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/.snapshot/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c to s3n://AKIAJPPHQ6RXAPWCFMAA:RVZ9Q1+ezHkUVPEbasg4BUIGAS59C27bhJiNNlgD@ul-pdc-eu/HDFS-SNAPSHOT-PROD.08-20-2015-06-06/apps/hbase/data/data/default/XXHBCL01/985fbc7692868e3315ada852bcb59e1d/tr/77c160e32bfc4175a65d6a56feaeeb6c
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
        ... 10 more
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.write(NativeS3FileSystem.java:293)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:255)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
        at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
        ... 11 more

但是设备上有足够的空间在4 tb左右,请帮助。

y0u0uwnf

y0u0uwnf1#

所以我遇到了同样的问题,下面是最终对我有效的方法:
hadoop distcp-d mapreduce.job.maxtaskfailures.per.tracker=1。。。
我尝试了一些方法(在同事的帮助下),但对我来说最有效的方法是将每个跟踪器的最大任务失败次数改为1。这主要是关键。基本上每个节点的空间都用完了。因此,通过这样做,我强迫作业一旦在节点上失败就不要在节点上重试。
其他我试过但没用的东西1。增加Map器的数量(-m)2。重试次数从3次增加到12次(-d.app.mapreduce.client.max retries=12)

相关问题