无法在具有SSE Hadoop配置的AvroParquetWriter中使用多个KMS密钥

wmomyfyw 于 2023-11-16 发布在 Hadoop

关注(0)|答案(1)|浏览(285)

使用在AWS EC2示例上运行的Java应用程序（非hadoop群集）我使用parquet-hadoop/avro库创建AvroParquetWriters以生成parquet文件，然后将这些文件写入S3中的存储桶。我创建了多个AvroParquetWriters，它们具有不同的配置，指定不同的KMS密钥用于加密，但创建的所有文件都使用相同的kms密钥进行加密（它使用在配置中首次使用的密钥）。
下面是我如何创建Configuration s和Writer s：

Configuration conf1 = new Configuration();

conf1.set("fs.s3a.server-side-encryption.key", awsKmsId1);
conf1.set("fs.s3a.server-side-encryption-algorithm", "SSE-KMS");
conf1.set("fs.s3a.connection.ssl.enabled", "true");
conf1.set("fs.s3a.endpoint", s3Endpoint);

Configuration conf2 = new Configuration();

conf2.set("fs.s3a.server-side-encryption.key", awsKmsId2);
conf2.set("fs.s3a.server-side-encryption-algorithm", "SSE-KMS");
conf2.set("fs.s3a.connection.ssl.enabled", "true");
conf2.set("fs.s3a.endpoint", s3Endpoint);


ParquetWriter<GenericRecord> writer1 = AvroParquetWriter.<GenericRecord>builder(path)
                    .withSchema(parquetSchema)
                    .withConf(conf1)
                    .withWriteMode(ParquetFileWriter.Mode.CREATE)
                    .build();

ParquetWriter<GenericRecord> writer2 = AvroParquetWriter.<GenericRecord>builder(path)
                    .withSchema(parquetSchema)
                    .withConf(conf2)
                    .withWriteMode(ParquetFileWriter.Mode.CREATE)
                    .build();

字符串
writer1和writer2创建不同的文件，但两者都使用awsKmsId1密钥加密，即使我指定了不同的密钥。

hadoop

来源：https://stackoverflow.com/questions/67892052/unable-to-use-multiple-kms-keys-in-avroparquetwriter-with-hadoop-configuration-f

1条答案

按热度按时间

fnx2tebb1#

我找到了解决这个问题的办法！此问题是由hadoop-common中的FileSystem缓存引起的（3.3.0）.它在构建该高速缓存键时不使用Configuration对象，因此当它试图从该高速缓存中获取FileSystem时，它返回旧的FileSystem，因为URI方案是相同的。我通过禁用conf.set("fs.s3a.impl.disable.cache", "true");的该高速缓存修复了这个问题。这个问题可以在这个Apache Jira issue中看到

赞(0）回复(0）举报 2023-11-16

我来回答

无法在具有SSE Hadoop配置的AvroParquetWriter中使用多个KMS密钥

1条答案

相关问题

热门标签

最新问答