描述您面临的问题如何将hudi表的位置更改为新位置。我的客户表保存在s3://aws-amazon-com/Customer/,我想将其更改为s3://aws-amazon-com/CustomerUpdated/。我在做胶水4
使用这些jar:hudi-spark3-bundle_2.12-0.12.1.jar方解石核心-1.16.0.jar libfb303-0.9.3.jar
val partitionColumnName: String = "year"
val hudiTableName: String = "Customer"
val preCombineKey: String = "id"
val recordKey = "id"
val tablePath = "s3://aws-amazon-com/Customer/"
val databaseName="consumer_bureau"
val hudiCommonOptions: Map[String, String] = Map(
"hoodie.table.name" -> hudiTableName,
"hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.datasource.write.precombine.field" -> preCombineKey,
"hoodie.datasource.write.recordkey.field" -> recordKey,
"hoodie.datasource.write.operation" -> "bulk_insert",
//"hoodie.datasource.write.operation" -> "upsert",
"hoodie.datasource.write.row.writer.enable" -> "true",
"hoodie.datasource.write.reconcile.schema" -> "true",
"hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
"hoodie.datasource.write.hive_style_partitioning" -> "true",
// "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
// "hoodie.upsert.shuffle.parallelism" -> "400",
"hoodie.datasource.hive_sync.enable" -> "true",
"hoodie.datasource.hive_sync.table" -> hudiTableName,
"hoodie.datasource.hive_sync.database" -> databaseName,
"hoodie.datasource.hive_sync.partition_fields" -> partitionColumnName,
"hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
"hoodie.datasource.hive_sync.use_jdbc" -> "false",
"hoodie.combine.before.upsert" -> "true",
"hoodie.index.type" -> "BLOOM",
"spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
)
val df=Seq((1,"Mark",1990),(2,"Martin",2009)).toDF("id","name","year")
df.write.format("org.apache.hudi")
.options(hudiCommonOptions)
.mode(SaveMode.Append)
.save(tablelocation)
val tablelocationUpdated="s3://eec-aws-uk-ukidcibatchanalytics-prod-hudi-replication/consumer_bureau/production/CustomerUpdated/"
df.write.format("org.apache.hudi") //writng to new location
.options(hudiCommonOptions)
.mode(SaveMode.Append)
.save(tablelocationUpdated)
强文本
当我查询Athena时,customer表指向s3://aws-amazon-com/Customer/,而不是预期的更新位置s3://aws-amazon-com/CustomerUpdated/。表的位置改变可以使用AWS胶水或AWS lambda实现。
请帮帮忙
2条答案
按热度按时间dgsult0t1#
是的,你可以改变hudi表的位置,你还需要手动改变glue中表的位置路径(例如通过aws控制台或使用was SDK)。配置单元同步不会自行更新位置。
mwecs4sa2#
将更改呼地表的表位置。