geomesa hbase查询的怪异行为

efzxgjgh  于 2021-06-07  发布在  Hbase
关注(0)|答案(1)|浏览(297)

我有一个关于hbase查询的问题。我看到许多数据被扫描,用于小型空间查询。我在osmnodes表上启动了地理空间查询。下面是查询和表的详细信息。我看到hbase上的读取请求总数(5553421708),以及大多数区域和区域服务器上的请求。我们为什么要为这个查询扫描每个区域(整个表)呢?


**Query**:

"DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"

**Table Schema:**

geomesa-hbase describe-schema -c atlas -f OSMNodes
INFO  Describing attributes of feature 'OSMNodes'
geometry           | Point     (Spatio-temporally indexed)
ingestionTimestamp | Timestamp (Spatio-temporally indexed)
nextTimestamp      | Timestamp 
serializerVersion  | String    
featurePayload     | String    

User data:
  geomesa.index.dtg    | ingestionTimestamp
  geomesa.indices      | z3:6:3:geometry:ingestionTimestamp,id:4:3:
  geomesa.stats.enable | true
  geomesa.z.splits     | 60

**Query Plan (through GeoMesa Cli):**

geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"

Planning 'OSMNodes' (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
  Original filter: (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Strategy selection:
    Query processing took 17ms for 1 options
    Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]]
    Strategy selection took 1ms for 1 options
  Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
    Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]
    Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
    Intervals: FilterValues(List((-∞,2020-05-27T16:59:31Z]),true,false)
    Plan: ScanPlan
      Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
      Ranges (7440): [%00;%0a;E$A%08;%00;%00;%00;%00;%00;::%00;%0a;E$A%0c;], [%01;%0a;E$A%08;%00;%00;%00;%00;%00;::%01;%0a;E$A%0c;], [%02;%0a;E$A%08;%00;%00;%00;%00;%00;::%02;%0a;E$A%0c;], [%03;%0a;E$A%08;%00;%00;%00;%00;%00;::%03;%0a;E$A%0c;], [%04;%0a;E$A%08;%00;%00;%00;%00;%00;::%04;%0a;E$A%0c;]
      Scans (120): ['%0a;ElA%98;%00;%00;%00;%00;%00;::'%0a;Ema%8c;], [:%0a;ElA%98;%00;%00;%00;%00;%00;:::%0a;Ema%8c;], [%14;::%14;%0a;ElA%8c;], [(%0a;ElA%98;%00;%00;%00;%00;%00;::(%0a;Ema%8c;], [%12;%0a;ElA%98;%00;%00;%00;%00;%00;::%12;%0a;Ema%8c;]
      Column families: d
      Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2629:2629),(zt,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
    Plan creation took 135ms
  Query planning took 433ms

其他查询[1]
在做实验时,当我添加较低的时间戳时,它将延迟从2-3小时减少到10-20分钟。

geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND ingestionTimestamp >= '2019-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"

Planning 'OSMNodes' ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND ingestionTimestamp >= 2019-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
  Original filter: ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND ingestionTimestamp >= '2019-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Strategy selection:
    Query processing took 24ms for 1 options
    Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]]
    Strategy selection took 2ms for 1 options
  Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
    Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]
    Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
    Intervals: FilterValues(List([2019-05-27T16:59:31Z,2020-05-27T16:59:31Z]),true,false)
    Plan: ScanPlan
      Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
      Ranges (404100): [%00;%0a;4$A%08;%00;%00;%00;%00;%00;::%00;%0a;4$A%0c;], [%01;%0a;4$A%08;%00;%00;%00;%00;%00;::%01;%0a;4$A%0c;], [%02;%0a;4$A%08;%00;%00;%00;%00;%00;::%02;%0a;4$A%0c;], [%03;%0a;4$A%08;%00;%00;%00;%00;%00;::%03;%0a;4$A%0c;], [%04;%0a;4$A%08;%00;%00;%00;%00;%00;::%04;%0a;4$A%0c;]
      Scans (4080): [2%0a;/la%08;%00;%00;%00;%00;%00;::2%0a;0da%9c;], [%18;%0a;$mA%08;%00;%00;%00;%00;%00;::%18;%0a;%eA%9c;], [%03;%0a;@%e%08;%00;%00;%00;%00;%00;::%03;%0a;@me%9c;], [%0f;%0a;%eE%08;%00;%00;%00;%00;%00;::%0f;%0a;&-E%9c;], ['%0a;Bda%08;%00;%00;%00;%00;%00;::'%0a;C,a%9c;]
      Column families: d
      Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2577:2629),(zt,1410483:2097151,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
    Plan creation took 475ms
  Query planning took 813ms

有这么大差别的原因是什么?

3htmauhk

3htmauhk1#

首先,感谢您提供索引信息和查询解释程序输出。这有助于我们更容易地回答。
当使用z3索引时,如果日期范围上只有一个上限(或类似的下限),那么索引空间中的范围就被牵连起来。对于每个分割,将扫描相同模式的z3范围,因此具有60个分割将导致要扫描的范围很多,并且这些范围很可能分布在hbase集群上。
有一些可能的事情可以尝试:1。用较少的射程2重新加速。添加一个z2(空间)索引来帮助处理这些类型的查询(空间 predicate 将返回一些记录,这些记录将被进一步过滤)。弄清楚是否可以添加一个较低的时间界限(诚然,这可能是不可能的)。在某些用例中,它确实是有意义的。)

相关问题