Spark在使用docker运行AWS S3的情况下-证书不匹配任何主题备选名称

wyyhbhjk  于 2023-04-21  发布在  Apache
关注(0)|答案(1)|浏览(94)

Running into a weird certificate issue that I've been debugging for days and took multiple stabs at this.
My application simply uploads a directory to an S3 bucket then pulls down that directory from that same S3 bucket into a spark dataframe.
I'm only using apache spark, hadoop-aws, aws-java-sdk-bundle
Spark version 3.1.1, Scala version 2.12, hadoop version 3.2.0, and aws java sdk version 1.11.901

  • verified AWS secret key and access key are 100% correct
  • Running application locally without docker works without any issues

When I try running my application with docker I'm able to upload the directory, but when I try to make an attempt to login and read the directory I run this stacktrace of exceptions (probably just a propagation from the first exception that occurs)

Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket-name/directory: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507) at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437) at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384) at com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) at com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) at sun.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) at com.amazonaws.http.conn.$Proxy60.connect(Unknown Source) at com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1333) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)

I find it odd because my colleague is also using the same credentials as me but he does not run into this issue at all?
Any ideas on why this might be happening?

  • Could it be something with the bucket policy or something?
8zzbczxx

8zzbczxx1#

如何使用Docker、Spark、Python读取AWS S3

Docker镜像:

Pyspark代码:

HADOOP_VERSION = '3.3.1'

packages = [
    f'org.apache.hadoop:hadoop-aws:{HADOOP_VERSION}',
    'com.google.guava:guava:31.1-jre',
    'org.apache.httpcomponents:httpcore:4.4.14', 
    'com.google.inject:guice:4.2.2', 
    'com.google.inject.extensions:guice-servlet:4.2.2'
]

conf = SparkConf().setAll([
    ('spark.jars.packages', ','.join(packages)),
    ('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider'),
    ('spark.hadoop.fs.s3a.access.key', credentials['AccessKeyId']),
    ('spark.hadoop.fs.s3a.secret.key', credentials['SecretAccessKey']),
    ('spark.hadoop.fs.s3a.session.token', credentials['SessionToken']),
    ('spark.hadoop.fs.s3a.path.style.access', True)
])

spark = SparkSession.builder.config(conf=conf).getOrCreate()

感谢@FelipeGonzalez的spark.hadoop.fs.s3a.path.style.access提示。

相关问题