数组搜索中的Postgresql索引

当数组大小大于5时，如果扫描较低的索引，则在seq中搜索。为什么？
explain select * from collection_attributes ca where ca."type" IN (10,20, 30, 60,70)

结果

Bitmap Heap Scan on collection_attributes ca  (cost=373.71..1974.97 rows=26354 width=92) (actual time=0.669..4.707 rows=26542 loops=1)
  Recheck Cond: (type = ANY ('{10,20,30,60,70}'::integer[]))
  Heap Blocks: exact=1082
  Buffers: shared hit=1140
  ->  Bitmap Index Scan on collection_attributes_type_index  (cost=0.00..367.12 rows=26354 width=0) (actual time=0.571..0.572 rows=27597 loops=1)
        Index Cond: (type = ANY ('{10,20,30,60,70}'::integer[]))
        Buffers: shared hit=58
Planning Time: 0.059 ms
Execution Time: 5.557 ms

但是
explain select * from collection_attributes ca where ca."type" IN (10,20, 30, 60,70, 80);

结果

Seq Scan on collection_attributes ca  (cost=0.00..2375.09 rows=43841 width=92) (actual time=0.006..8.495 rows=44114 loops=1)
  Filter: (type = ANY ('{10,20,30,60,70,80}'::integer[]))
  Rows Removed by Filter: 24831
  Buffers: shared hit=1173
Planning Time: 0.060 ms
Execution Time: 9.904 ms

我需要索引搜索所有大小的数组

Seq Scan和index access之间的选择由优化器根据计算的成本来完成。
非平凡表的Seq Scan具有相当高的成本，但它是常数，与过滤条件无关。
相反，选择少量行的索引访问成本非常低，但它随所选行的数量线性增加。对于表中某些百分比的所选行，索引访问的成本将不可避免地增加顺序扫描的成本- * 这是您观察到的 *。

图示示例

create table test as
select 
mod(id,10)*10 type,
rpad('x',87,'y') pad
from generate_series(1,10000000) t(id)
;

create index test_idx on test(type);
analyze  test;

索引访问

要获取具有type in (10)的行，优化器选择开销为266174.50的Bitmap Heap Scan

explain analyze
select * from test where type in (10);

QUERY PLAN                                                                                                                       |
---------------------------------------------------------------------------------------------------------------------------------+
Bitmap Heap Scan on test  (cost=11259.68..266174.50 rows=1011000 width=92) (actual time=65.373..1086.653 rows=1000000 loops=1)   |
  Recheck Cond: (type = 10)                                                                                                      |
  Rows Removed by Index Recheck: 5780991                                                                                         |
  Heap Blocks: exact=55027 lossy=98820                                                                                           |
  ->  Bitmap Index Scan on test_idx  (cost=0.00..11006.93 rows=1011000 width=0) (actual time=51.336..51.337 rows=1000000 loops=1)|
        Index Cond: (type = 10)                                                                                                  |
Planning Time: 0.068 ms                                                                                                          |
...                     |
Execution Time: 1125.095 ms

顺序扫描

选择type in (10, 20)时，选择顺序扫描，成本为278847.00

explain analyze
select * from test where type in (10, 20);

QUERY PLAN                                                                                                       |
-----------------------------------------------------------------------------------------------------------------+
Seq Scan on test  (cost=0.00..278847.00 rows=2025333 width=92) (actual time=1.783..1069.878 rows=2000000 loops=1)|
  Filter: (type = ANY ('{10,20}'::integer[]))                                                                    |
  Rows Removed by Filter: 8000000                                                                                |
Planning Time: 0.065 ms                                                                                          |
...
Execution Time: 1132.009 ms

您可以验证使用SET enable_seqscan = OFF;禁用顺序扫描的原因
该计划使用了索引，但成本高于之前使用Seq扫描的计划，因此默认情况下未选择***

QUERY PLAN                                                                                                                       |
---------------------------------------------------------------------------------------------------------------------------------+
Bitmap Heap Scan on test  (cost=21925.21..279540.57 rows=2025333 width=92) (actual time=87.542..1265.229 rows=2000000 loops=1)   |
  Recheck Cond: (type = ANY ('{10,20}'::integer[]))                                                                              |
  Rows Removed by Index Recheck: 5138682                                                                                         |
  Heap Blocks: exact=55027 lossy=98820                                                                                           |
  ->  Bitmap Index Scan on test_idx  (cost=0.00..21418.87 rows=2025333 width=0) (actual time=73.613..73.613 rows=2000000 loops=1)|
        Index Cond: (type = ANY ('{10,20}'::integer[]))                                                                          |
Planning Time: 0.073 ms                                                                                                          |
...
Execution Time: 1342.678 ms

我需要索引搜索所有大小的数组
考虑到上面的讨论，你应该详细说明为什么你需要这个...

数组搜索中的Postgresql索引

1条答案

相关问题

热门标签

最新问答