hbase,有没有更快的方法来获取所有区域及其相应的开始键和结束键?

11dmarpk  于 2021-06-09  发布在  Hbase
关注(0)|答案(1)|浏览(381)

我试图重写hbase方法:multitableinputformat.getsplits(),我有如下实现:

public List<InputSplit> getSplits(JobContext context) throws IOException {
    List<Scan> scans = getScans();
    List<InputSplit> splits = new ArrayList<>();
    Scan sampleScan = scans.get(0);
    byte[] tableNameBytes = sampleScan.getAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME);

    TableName tableName = TableName.valueOf(tableNameBytes);
    Table table = null;
    RegionLocator regionLocator = null;
    Connection conn = null;
      conn = ConnectionFactory.createConnection(context.getConfiguration());
      table = conn.getTable(tableName);
      regionLocator = conn.getRegionLocator(tableName);
      regionLocator = (RegionLocator) table;
      Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();

      RegionSizeCalculator sizeCalculator = new RegionSizeCalculator(
        regionLocator, conn.getAdmin()
      );
      int regionCount = keys.getFirst().length;

      for (int i = 0; i < regionCount; i++) {
        calculateSplits(
          keys.getFirst()[i],
          keys.getSecond()[i],
          regionLocator,
          sizeCalculator,
          splits
        );
      }
    return splits;
  }

  private void calculateSplits(
    final byte[] startKey,
    final byte[] endKey,
    RegionLocator regionLocator,
    RegionSizeCalculator sizeCalculator,
    List<InputSplit> splits
  ) throws IOException {
    HRegionLocation hregionLocation = regionLocator.getRegionLocation(startKey, false);
    String regionHostname = hregionLocation.getHostname();
    HRegionInfo regionInfo = hregionLocation.getRegionInfo();

    for (Scan scan : getScans()) {
      byte[] startRow = scan.getStartRow();
      byte[] stopRow = scan.getStopRow();

      // determine if the given start and stop keys fall into the range
      if (
        (startRow.length == 0 || endKey.length == 0 || Bytes.compareTo(startRow, endKey) < 0) &&
        (stopRow.length == 0 || Bytes.compareTo(stopRow, startKey) > 0)
        ) {
        byte[] splitStart = startRow.length == 0 || Bytes.compareTo(startKey, startRow) >= 0 ?
          startKey : startRow;
        byte[] splitStop =
          (stopRow.length == 0 || Bytes.compareTo(endKey, stopRow) <= 0) && endKey.length > 0 ?
            endKey : stopRow;

        long regionSize = sizeCalculator.getRegionSize(regionInfo.getRegionName());
        TableSplit split = new TableSplit(
          regionLocator.getName(), scan, splitStart, splitStop, regionHostname, regionSize
        );
        splits.add(split);
      }
    }
  }

这段代码的基本思想是获取所有区域及其开始和结束键。我们还有一份扫描清单。我们将检查所有扫描*所有区域以获得所有分割。但是这段代码非常慢,主要是因为我们有大约10000个区域。因此,扫描和计算每个区域的信息需要花费大量的时间。
我注意到在regionlocator中还有一个名为getallregionlocations()的方法,我想我可以使用这个方法一次获取所有区域并节省大量时间。但问题是如果我使用这种方法,我不能得到相应的开始和结束键,那么我就不能决定分割的范围。有没有更好的解决方法让这个方法更快的想法?

zazmityj

zazmityj1#

解决了的!我发现我们可以从regioninfo得到startkey和endkey。因此,首先获取一个列表,扫描列表中的所有regionlocation,第二个方法变为:

private void calculateSplits(
    HRegionLocation hRegionLocation,
    RegionLocator regionLocator,
    RegionSizeCalculator sizeCalculator,
    List<InputSplit> splits
  ) throws IOException {
    String regionHostname = hRegionLocation.getHostname();
    HRegionInfo regionInfo = hRegionLocation.getRegionInfo();
    final byte[] startKey = regionInfo.getStartKey();
    final byte[] endKey = regionInfo.getEndKey();
    ...
}

相关问题