sparksql delete命令不会删除apache iceberg中的一行,是吗?

0lvr5msh  于 2021-07-03  发布在  Java
关注(0)|答案(0)|浏览(632)

我使用SparkSQL3.0和Scala2.12。我将数据插入到iceberg表中,并成功地从表中读取数据。当我试图用spark sql从表中删除一条错误的记录时,日志显示异常。github中ApacheIceberg的问题1444显示了iceberg在上一版本中支持行级删除。为什么删除失败?我使用的主要冰山版本是0.10.0。包org.apache.iceberg.iceberg-hive的版本是0.9.1。请帮帮我!我的spark sql代码段是:

public static void deleteSingleDataWithoutCatalog3(){
    // SparkSQL Configure
    SparkConf sparkSQLConf = new SparkConf();
    // 'hadoop_prod' is name of the catalog,which is used in accessing table
    sparkSQLConf.set("spark.sql.catalog.hadoop_prod", "org.apache.iceberg.spark.SparkCatalog");
    sparkSQLConf.set("spark.sql.catalog.hadoop_prod.type", "hadoop");
    sparkSQLConf.set("spark.sql.catalog.hadoop_prod.warehouse", "hdfs://hadoop01:9000/warehouse_path/");
    sparkSQLConf.set("spark.sql.sources.partitionOverwriteMode", "dynamic");

    SparkSession spark = SparkSession.builder().config(sparkSQLConf).master("local[2]").getOrCreate();
    // String selectDataSQLALL = "select * from  hadoop_prod.xgfying.booksSpark3 ";
    String deleteSingleDataSQL = "DELETE FROM  hadoop_prod.xgfying.booksSpark3 where price=33 ";
    // spark.sql(deleteSingleDataSQL);
    spark.table("hadoop_prod.xgfying.booksSpark3").show();
    spark.sql(deleteSingleDataSQL);
    spark.table("hadoop_prod.xgfying.booksSpark3").show();
}

代码运行时,异常消息为:

......
Exception in thread "main" java.lang.IllegalArgumentException: Failed to cleanly delete data files matching: ref(name="price") == 33
        at org.apache.iceberg.spark.source.SparkTable.deleteWhere(SparkTable.java:168)
......
Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot delete file where some, but not all, rows match filter ref(name="price") == 33: hdfs://hadoop01:9000/warehouse_path/xgfying/booksSpark3/data/title=Gone/00000-1-9070110f-35f8-4ee5-8047-cca2a1caba1f-00001.parquet
......

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题