s3distcp在cdh4.2上失败

8ftvxx2r  于 2021-06-03  发布在  Hadoop
关注(0)|答案(0)|浏览(272)

我正在尝试运行s3distcp,以便将许多小文件(200-600kb)从s3合并到hdfs。
我正在ubuntu上运行CDH4.2上的hadoop。
具体来说:hadoop 2.0.0-cdh4.2.0 subversion文件:///var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/cdh4.2.0-packaging-hadoop-2013-02-15\ 10-38-54/hadoop-2.0.0+922-1.cdh4.2.0.p0.12~precise/src/hadoop common project/hadoop common-r 8bce4bd28a464e0c92950c50ba01a9deb1d85686
我以前通过将aws-java-sdk-1.4.1.jar和s3distcp.jar复制到hadoop类路径中,解决了它们的所有依赖关系。还安装了libsnappy1。
但当我跑的时候:

hdfs@test-cdh-03-master:/home/ubuntu$ hadoop jar /usr/lib/hadoop/lib/s3distcp.jar --src 's3n://workdir-XXXX-YYYYlogs/production-YYYYYlogs/Log-FFFFFFF-click/'  --dest 'hdfs:///test/'  --groupBy 'Log-FFFFF(.*)'

我得到以下错误堆栈:

13/04/08 14:36:30 INFO s3distcp.S3DistCp: Using output path 'hdfs:/tmp/ab7c0a09-07ba-4592-b354-bcd0dd3d6a07/output'
13/04/08 14:36:36 INFO s3distcp.S3DistCp: Created 0 files to copy 0 files
13/04/08 14:36:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/04/08 14:36:37 INFO mapred.JobClient: Cleaning up the staging area hdfs://test-cdh-03-master.extc.test-cdh-03.adswizz.com/tmp/hadoop-temp/mapred/staging/hdfs/.staging/job_201304041515_0016
13/04/08 14:36:37 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:/tmp/ab7c0a09-07ba-4592-b354-bcd0dd3d6a07/files
13/04/08 14:36:37 INFO s3distcp.S3DistCp: Try to recursively delete hdfs:/tmp/ab7c0a09-07ba-4592-b354-bcd0dd3d6a07/tempspace
Exception in thread "main" java.lang.RuntimeException: Error running job
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:586)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:/tmp/ab7c0a09-07ba-4592-b354-bcd0dd3d6a07/files
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:194)
    at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1369)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:568)
    ... 9 more

还有什么我应该试试的吗?正则表达式有什么我看不见的问题吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题