我已经在amazonemr上成功地完成了mahout矢量化工作(使用elasticmapreduce上的mahout作为参考)。现在我想将结果从hdfs复制到s3(以便在将来的集群中使用它)。
For that I've used hadoop distcp:
den@aws:~$ elastic-mapreduce --jar s3://elasticmapreduce/samples/distcp/distcp.jar \
> --arg hdfs://my.bucket/prj1/seqfiles \
> --arg s3n://ACCESS_KEY:SECRET_KEY@my.bucket/prj1/seqfiles \
> -j $JOBID
失败。发现建议:使用s3distcp也尝试过:
elastic-mapreduce --jobflow $JOBID \
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' \
> --arg --src --arg 'hdfs://my.bucket/prj1/seqfiles' \
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'
在这两种情况下,我都有相同的错误:java.net.unknownhostexception:unknown host:my.bucket
低于第二种情况的完整错误输出。
2012-09-06 13:25:08,209 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system
java.net.UnknownHostException: unknown host: my.bucket
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1193)
at org.apache.hadoop.ipc.Client.call(Client.java:1047)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:127)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:249)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:214)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:431)
at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
1条答案
按热度按时间csbfibhn1#
我发现了一个错误:
主要问题不是
java.net.unknownhostexception:未知主机:my.bucket
但是:
所以。在源路径中再添加一个斜杠之后,作业就顺利启动了。正确的命令是:
p、 所以。它正在工作。作业已正确完成。我已经成功地用30gb文件复制了dir。