httpclient.rests3service 404分发时出错

thigvfpy  于 2021-06-03  发布在  Hadoop
关注(0)|答案(0)|浏览(177)

我无法在ec2上新安装的cdh4系统上运行从s3到hdfs的distcp。我也不能从中找到一个s3目录。

ubuntu@ip-10-145-227-232:~$ hadoop distcp s3://access_key:secret_key@bucket/logs hdfs://ip-10-145-227-232.ec2.internal:8020/tmp
13/05/20 19:07:45 INFO tools.DistCp: srcPaths=[s3://access_key:secret_key@bucket/logs]
13/05/20 19:07:45 INFO tools.DistCp: destPath=hdfs://ip-10-145-227-232.ec2.internal:8020/tmp
13/05/20 19:07:48 WARN httpclient.RestS3Service: Response '/%2Flogs' - Unexpected response code 404, expected 200
13/05/20 19:07:48 WARN httpclient.RestS3Service: Response '/%2Flogs' - Received error response with XML message
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source s3://access_key:secret_key@bucket/logs does not exist.
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

同时,我可以通过类似的命令在hbase erm集群上列出和复制。

hadoop@ip-10-165-7-106:~$ hadoop distcp s3://access_key:secret_key@bucket/logs/ hdfs://10.165.7.106:9000/test/
13/05/20 19:01:50 INFO tools.DistCp: srcPaths=[s3://access_key:secret_key@bucket/logs]
13/05/20 19:01:50 INFO tools.DistCp: destPath=hdfs://10.165.7.106:9000/test
13/05/20 19:04:47 INFO tools.DistCp: sourcePathsCount=11149
13/05/20 19:04:47 INFO tools.DistCp: filesToCopyCount=7816
13/05/20 19:04:47 INFO tools.DistCp: bytesToCopyCount=443.9m
13/05/20 19:04:47 INFO mapred.JobClient: Default number of map tasks: 1
13/05/20 19:04:47 INFO mapred.JobClient: Default number of reduce tasks: 0
13/05/20 19:04:47 INFO security.ShellBasedUnixGroupsMapping: add hadoop to shell userGroupsCache
13/05/20 19:04:47 INFO mapred.JobClient: Setting group to hadoop
13/05/20 19:04:48 INFO mapred.JobClient: Running job: job_201305201846_0001
13/05/20 19:04:49 INFO mapred.JobClient:  map 0% reduce 0%
13/05/20 19:05:10 INFO mapred.JobClient:  map 1% reduce 0%
13/05/20 19:05:22 INFO mapred.JobClient:  map 2% reduce 0%
13/05/20 19:05:31 INFO mapred.JobClient:  map 3% reduce 0%
13/05/20 19:05:40 INFO mapred.JobClient:  map 4% reduce 0%

请帮帮我!非常感谢你!
更新:在cdh集群上将“s3”替换为“s3n”,我可以列出文件,但仍然不能distcp。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题