在从glue运行时在两个aws帐户之间写入时设置s3 bucket权限

wooyq4lh 于 2021-05-24 发布在 Spark

关注(0)|答案(1)|浏览(558)

我有一个scala罐，我是从aws胶水工作打电话来的。我的jar写入是为了将Dataframe写入另一个打开了kms加密的aws帐户中的s3存储桶。我可以写入存储桶，但无法添加目标存储桶所有者访问文件的权限。我可以做到这一点，如果简单地使用胶水作家，但与直Spark，它只是不工作。我已经阅读了所有文档，并在hadoop配置中设置了以下bucket策略。
def writedataframeintargetlocation（spa）定义rkcontext:sparkcontext = null，dataframe:dataframe，location:string，fileformat:string，savemode:string，编码器yptionkey:option[string]=option.empty，公里_region:option[字符串]=选项（“us-west-2”）：单位={
if（encryptionkey.isdefined）{val region=if（kms\u region.isdefined）kms\u region.getorelse（“us-west-2”）else“us-west-2”

sparkContext.hadoopConfiguration.set("fs.s3.enableServerSideEncryption", "false")
    sparkContext.hadoopConfiguration.set("fs.s3.cse.enabled", "true")
    sparkContext.hadoopConfiguration.set("fs.s3.cse.encryptionMaterialsProvider", "com.amazon.ws.emr.hadoop.fs.cse.KMSEncryptionMaterialsProvider")
    sparkContext.hadoopConfiguration.set("fs.s3.cse.kms.keyId", encryptionKey.get) // KMS key to encrypt the data with
      sparkContext.hadoopConfiguration.set("fs.s3.cse.kms.region", region) // the region for the KMS key
    sparkContext.hadoopConfiguration.set("fs.s3.canned.acl", "BucketOwnerFullControl")
    sparkContext.hadoopConfiguration.set("fs.s3.acl.default", "BucketOwnerFullControl")
    sparkContext.hadoopConfiguration.set("fs.s3.acl", "bucket-owner-full-control")
    sparkContext.hadoopConfiguration.set("fs.s3.acl", "BucketOwnerFullControl")
  }
  else {
    sparkContext.hadoopConfiguration.set("fs.s3.canned.acl", "BucketOwnerFullControl")
    sparkContext.hadoopConfiguration.set("fs.s3.acl.default", "BucketOwnerFullControl")
    sparkContext.hadoopConfiguration.set("fs.s3.acl", "bucket-owner-full-control")
    sparkContext.hadoopConfiguration.set("fs.s3.acl", "BucketOwnerFullControl")
  }

    val writeDF = dataFrame
      .repartition(5)
      .write

      writeDF
        .mode(saveMode)
        .option(Header, true)
        .format(fileFormat)
        .save(location)
    }

apache-spark aws-glue amazon-s3 amazon-web-services

来源：https://stackoverflow.com/questions/64094749/setting-s3-bucket-permissions-when-writing-between-2-aws-accounts-while-running

1条答案

按热度按时间

zour9fqk1#

您可能正在使用s3afilesystem实现 s3 “方案（即表单的URL）” s3://... "). 你可以通过看 sparkContext.hadoopConfiguration.get("fs.s3.impl") . 如果是这样，那么您实际上需要为 fs.s3a.* “不是” fs.s3.* ".
那么正确的设置是：

sparkContext.hadoopConfiguration.set("fs.s3a.canned.acl", "BucketOwnerFullControl")
sparkContext.hadoopConfiguration.set("fs.s3a.acl.default", "BucketOwnerFullControl")

这个 S3AFileSystem 实现未使用“”下的任何属性 fs.s3 ". 通过研究与以下hadoop源代码链接相关的代码，您可以看到：https://github.com/apache/hadoop/blob/43e8ac60971323054753bb0b21e52581f7996ece/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/constants.java#l268

赞(0）回复(0）举报 2021-05-25

我来回答

在从glue运行时在两个aws帐户之间写入时设置s3 bucket权限

1条答案

相关问题

热门标签

最新问答