有人在法兰克福使用hadoop/spark 1.6.0使用s3吗?
我试图将作业的结果存储在s3上,我的依赖项声明如下:
"org.apache.spark" %% "spark-core" % "1.6.0" exclude("org.apache.hadoop", "hadoop-client"),
"org.apache.spark" %% "spark-sql" % "1.6.0",
"org.apache.hadoop" % "hadoop-client" % "2.7.2",
"org.apache.hadoop" % "hadoop-aws" % "2.7.2"
我已设置以下配置:
System.setProperty("com.amazonaws.services.s3.enableV4", "true")
sc.hadoopConfiguration.set("fs.s3a.endpoint", ""s3.eu-central-1.amazonaws.com")
打电话的时候 saveAsTextFile
在我的rdd上,它启动正常,保存s3上的所有内容。然而,当它从 _temporary
对于最终输出结果,它将产生以下错误:
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
如果我使用 hadoop-client
从Spark包它甚至没有开始转移。错误是随机发生的,有时有效,有时无效。
3条答案
按热度按时间v9tzhpje1#
请尝试设置以下值:
请设置该存储桶所在的区域,在我的示例中是:
eu-central-1
并将依赖性添加到gradle或以其他方式:希望能有帮助。
elcex8rz2#
受其他答案的启发,直接在pyspark shell中运行以下命令为我生成了所需的输出:
在以下位置选择您的端点:我能够从亚太地区(孟买)(ap-south-1)获取数据的端点列表,亚太地区是一个仅适用于版本4的地区。
emeijp433#
如果你用的是pyspark,下面的方法对我有用