hbase扫描正在返回已删除的行

我正在使用singlecolumnvaluefilter返回要删除的行列表：

SingleColumnValueFilter fileTimestampFilter = new SingleColumnValueFilter(
         Bytes.toBytes('a'),
         Bytes.toBytes('date'),
         CompareFilter.CompareOp.GREATER,
         Bytes.toBytes("20140101000000")
         );

然后创建一个delete对象并删除每一列。

Delete delete = new Delete(Bytes.toBytes(rowKey));
delete.deleteColumn(Bytes.toBytes('a'), Bytes.toBytes('date'));
htable.delete(delete);

检索代码为

private List<String> getRecordsToDelete(long maxResultSize)
{
  ResultScanner rs = null;
  HTableInterface table = null;
  List<String> keyList = new ArrayList<String>();
  try
  {
    log.debug("Retrieving records");      
    HbaseConnection hbaseConnectionConfig = myConfig.getHbaseConnection();
    Configuration configuration = getHbaseConfiguration(hbaseConnectionConfig);
    table = new HTable(configuration, 'mytable');
    FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
    Filter filter = HbaseDao.getFilter();
    list.addFilter(filter);
    list.addFilter(new PageFilter(maxResultSize));
    Scan scan = new Scan();
    scan.setFilter(list);
    //scan.setMaxResultSize(maxResultSize);
    //scan.setCaching(1);
    //scan.setCacheBlocks(false);
    //log.debug("Scan raw? = " + scan.isRaw());
    //scan.setRaw(false);
    rs = table.getScanner(scan);      
    Iterator<Result> iterator = rs.iterator();      
    while (iterator.hasNext())
    {        
      Result result = iterator.next();        
      String key = Bytes.toString(result.getRow());
      log.debug("****************f key = " + key); //the same keys are always added here
      keyList.add(key);        
    }
    log.debug("Done processing retrieval of records to delete Size = " + keyList.size());
  }
  catch (Exception ex)
  {
    log.error("Unable to process retrieval of records.", ex);
  }
  finally
  {
    try
    {
      if (table !=  null)
      {
        table.close();
      }
      if (rs != null)
      {
        rs.close();
      }
    }
    catch (IOException ioEx)
    {
      //do nothing
      log.error(ioEx);
    }
  }
  return keyList;
}

此任务已安排，当它再次运行时，它将检索相同的行。我知道hbase会将行标记为删除，然后它们只会在主要压缩之后被物理删除。如果在任务运行之间通过hbase shell查询行，则该列肯定已被删除。为什么我的扫描在这个任务的后续运行中返回相同的行？
提前谢谢！

它与主要压缩无关（默认情况下，它们每24小时运行一次）。删除行时，hbase将忽略已删除的数据，直到最终删除（在主压缩上）。请注意，如果没有激活autoflush，则必须首先通过调用 htable.flushCommits() （默认情况下，autoflush=on）。
你的问题可能是因为你只删除了 a:date 您的行中有更多的列正在被读取，并且它们正在传递筛选器，因为如果没有值存在，这是默认行为。
如果要删除整行，只需删除 delete.deleteColumn(Bytes.toBytes('a'), Bytes.toBytes('date')); 删除行，而不仅仅是列。
如果你只是想删除 a:date 列，同时保持行的其余部分不变，设置filterifmissing标志以避免使用 a:date == null 正在进行（因为它已被删除）： filter.setFilterIfMissing(true); 或者，为了获得最佳性能，只将该列添加到扫描中，这将阻止读取其他列： scan.addColumn(Bytes.toBytes('a'), Bytes.toBytes('date')); 另一方面，请注意 list.addFilter(new PageFilter(maxResultSize)); 如果要从表的每个区域检索maxresultsize结果，则必须在键列表达到maxresultsize时通过打破它在迭代器中手动实现限制。
还有一个提示，当出于调试目的进行日志记录时，一定要记录完整的结果，以便准确地看到其中的内容。

hbase扫描正在返回已删除的行

1条答案

相关问题

热门标签

最新问答