PostgreSQL Simple JOIN非常慢

ycggw6v2 于 2023-11-18 发布在 PostgreSQL

关注(0)|答案(5)|浏览(175)

我有一个简单的查询和两个表：

drilldown

CREATE SEQUENCE drilldown_id_seq;

CREATE TABLE drilldown (
    transactionid bigint NOT NULL DEFAULT nextval('drilldown_id_seq'),
    userid bigint NOT NULL default 0 REFERENCES users(id),
    pathid bigint NOT NULL default 0,
    reqms bigint NOT NULL default 0,
    quems bigint NOT NULL default 0,
    clicktime timestamp default current_timestamp,
    PRIMARY KEY(transactionid)
);

ALTER SEQUENCE drilldown_id_seq OWNED BY drilldown.transactionid;

CREATE INDEX drilldown_idx1 ON drilldown (clicktime);

字符串

querystats

CREATE SEQUENCE querystats_id_seq;

CREATE TABLE querystats (
    id bigint NOT NULL DEFAULT nextval('querystats_id_seq'),
    transactionid bigint NOT NULL default 0 REFERENCES drilldown(transactionid),
    querynameid bigint NOT NULL default 0 REFERENCES queryname(id),
    queryms bigint NOT NULL default 0,
    PRIMARY KEY(id)
);

ALTER SEQUENCE querystats_id_seq OWNED BY querystats.id;

CREATE INDEX querystats_idx1 ON querystats (transactionid);
CREATE INDEX querystats_idx2 ON querystats (querynameid);

型
drilldown有150万条记录，而querystats有1000万条记录;当我在两者之间进行连接时，问题就发生了。

查询

explain analyse
select avg(qs.queryms)
  from querystats qs
  join drilldown d on (qs.transactionid=d.transactionid)
  where querynameid=1;

型

酒店QUERY HOTEL

Aggregate  (cost=528596.96..528596.97 rows=1 width=8) (actual time=5213.154..5213.154 rows=1 loops=1)
   ->  Hash Join  (cost=274072.53..518367.59 rows=4091746 width=8) (actual time=844.087..3528.788 rows=4117717 loops=1)
         Hash Cond: (qs.transactionid = d.transactionid)
         ->  Bitmap Heap Scan on querystats qs  (cost=88732.62..210990.44 rows=4091746 width=16) (actual time=309.502..1321.029 rows=4117717 loops=1)
               Recheck Cond: (querynameid = 1)
               ->  Bitmap Index Scan on querystats_idx2  (cost=0.00..87709.68 rows=4091746 width=0) (actual time=307.916..307.916 rows=4117718 loops=1)
                     Index Cond: (querynameid = 1)
         ->  Hash  (cost=162842.29..162842.29 rows=1371250 width=8) (actual time=534.065..534.065 rows=1372574 loops=1)
               Buckets: 4096  Batches: 64  Memory Usage: 850kB
               ->  Index Scan using drilldown_pkey on drilldown d  (cost=0.00..162842.29 rows=1371250 width=8) (actual time=0.015..364.657 rows=1372574 loops=1)
 Total runtime: 5213.205 ms
(11 rows)

型
我知道有一些调优参数我可以为PostgreSQL调整，但我想知道的是，我正在做的查询是连接两个表的最佳方式？
或者是某种内部连接？我只是不确定。
任何指针是赞赏！

编辑

database#\d drilldown
                                       Table "public.drilldown"
    Column     |            Type             |                       Modifiers                        
---------------+-----------------------------+--------------------------------------------------------
 transactionid | bigint                      | not null default nextval('drilldown_id_seq'::regclass)
 userid        | bigint                      | not null default 0
 pathid        | bigint                      | not null default 0
 reqms         | bigint                      | not null default 0
 quems         | bigint                      | not null default 0
 clicktime     | timestamp without time zone | default now()
Indexes:
    "drilldown_pkey" PRIMARY KEY, btree (transactionid)
    "drilldown_idx1" btree (clicktime)
Foreign-key constraints:
    "drilldown_userid_fkey" FOREIGN KEY (userid) REFERENCES users(id)
Referenced by:
    TABLE "querystats" CONSTRAINT "querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)

database=# \d querystats
                            Table "public.querystats"
    Column     |  Type  |                        Modifiers                        
---------------+--------+---------------------------------------------------------
 id            | bigint | not null default nextval('querystats_id_seq'::regclass)
 transactionid | bigint | not null default 0
 querynameid   | bigint | not null default 0
 queryms       | bigint | not null default 0
Indexes:
    "querystats_pkey" PRIMARY KEY, btree (id)
    "querystats_idx1" btree (transactionid)
    "querystats_idx2" btree (querynameid)
Foreign-key constraints:
    "querystats_querynameid_fkey" FOREIGN KEY (querynameid) REFERENCES queryname(id)
    "querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)

型
这里是两个表格的要求和版本

PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

型
因此，这个查询所做的是从每个查询类型（querynameid）的queryms的所有行值中获取平均值

name            |         current_setting          |        source        
----------------------------+----------------------------------+----------------------
 application_name           | psql                             | client
 client_encoding            | UTF8                             | client
 DateStyle                  | ISO, MDY                         | configuration file
 default_text_search_config | pg_catalog.english               | configuration file
 enable_seqscan             | off                              | session
 external_pid_file          | /var/run/postgresql/9.1-main.pid | configuration file
 lc_messages                | en_US.UTF-8                      | configuration file
 lc_monetary                | en_US.UTF-8                      | configuration file
 lc_numeric                 | en_US.UTF-8                      | configuration file
 lc_time                    | en_US.UTF-8                      | configuration file
 log_line_prefix            | %t                               | configuration file
 log_timezone               | localtime                        | environment variable
 max_connections            | 100                              | configuration file
 max_stack_depth            | 2MB                              | environment variable
 port                       | 5432                             | configuration file
 shared_buffers             | 24MB                             | configuration file
 ssl                        | on                               | configuration file
 TimeZone                   | localtime                        | environment variable
 unix_socket_directory      | /var/run/postgresql              | configuration file
(19 rows)

型
我看到enable_seqscan=off，我没有触及任何设置，这是一个完全默认的安装。
更新
我从下面的评论中做了一些修改，这是结果。

explain analyse 
SELECT 
(
  SELECT avg(queryms) AS total 
  FROM querystats 
  WHERE querynameid=3
) as total 
FROM querystats qs 
JOIN drilldown d ON (qs.transactionid=d.transactionid) 
WHERE qs.querynameid=3 
limit 1;

型
查询计划

---------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=196775.99..196776.37 rows=1 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
       InitPlan 1 (returns $0)
         ->  Aggregate  (cost=196775.94..196775.99 rows=1 width=8) (actual time=2320.815..2320.815 rows=1 loops=1)
               ->  Bitmap Heap Scan on querystats  (cost=24354.25..189291.69 rows=2993698 width=8) (actual time=226.516..1144.690 rows=2999798 loops=1)
                     Recheck Cond: (querynameid = 3)
                     ->  Bitmap Index Scan on querystats_idx  (cost=0.00..23605.83 rows=2993698 width=0) (actual time=225.119..225.119 rows=2999798 loops=1)
                           Index Cond: (querynameid = 3)
       ->  Nested Loop  (cost=0.00..1127817.12 rows=2993698 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
             ->  Seq Scan on drilldown d  (cost=0.00..76745.10 rows=1498798 width=8) (actual time=0.009..0.009 rows=1 loops=1)
             ->  Index Scan using querystats_idx on querystats qs  (cost=0.00..0.60 rows=2 width=8) (actual time=0.045..0.045 rows=1 loops=1)
                   Index Cond: ((querynameid = 3) AND (transactionid = d.transactionid))
     Total runtime: 2320.940 ms
    (12 rows)

型

postgresql

来源：https://stackoverflow.com/questions/14793708/postgresql-simple-join-very-slow

5条答案

按热度按时间

ujv3wf0j1#

它的行为就像你已经设置了enable_seqscan = off，因为它使用索引扫描来填充哈希表。除了诊断步骤，不要设置任何计划器选项，如果你正在显示计划，请显示使用的任何选项。这可以运行以显示许多有用的信息：

SELECT version();
SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

字符串
如果你告诉我们运行时环境，特别是机器上的RAM数量，你的存储系统是什么样子的，以及数据库的大小（或者更好的是，数据库中经常引用的数据的 * 活动数据集 *），这也会有所帮助。
作为一个粗略的细分，5.2秒细分为：

1.3秒找到4，117，717 querystats行符合您的选择标准。
2.3秒随机匹配drilldown记录。
1.6秒通过4，117，717行并计算平均值。
因此，即使你似乎已经削弱了它使用最快计划的能力，它也只需要1.26微秒（百万分之一秒）来定位每一行，将其连接到另一行，并将其用于计算平均值。这在绝对基础上并不太糟糕，但你几乎可以肯定会得到一个稍微快一点的计划。
首先，如果您使用的是9.2.x，其中x小于3，请立即升级到9.2.3。在最近的版本中修复了某些类型的计划的性能回归，这可能会影响此查询。一般来说，try to stay up-to-date on minor releases（其中版本号更改超过第二个点）。
您可以在单个会话中测试不同的计划，只需在该连接上设置规划因子并运行查询（或在其上运行EXPLAIN）。尝试以下操作：

SET seq_page_cost = 0.1;
SET random_page_cost = 0.1;
SET cpu_tuple_cost = 0.05;
SET effective_cache_size = '3GB'; -- actually use shared_buffers plus OS cache

型
确保所有enable_设置都是on。

赞(0）回复(0）举报 2023-11-18

xpcnnkqh2#

你在问题中声称：
我看到enable_seqscan=off，我没有触及任何设置，这是一个完全默认的安装。
相反，pg_settings的输出告诉我们：
启用seqscan|关闭|届会议
这意味着，您在会话中设置了enable_seqscan = off**。这里有些东西不对劲。
运行

SET enable_seqscan = on;

字符串
或

RESET enable_seqscan;

型
Assert：

SHOW enable_seqscan;

型
此外，对于一个拥有数百万条记录的数据库来说，你对**shared_buffers**的设置 * 太低了 *。24MB似乎是Ubuntu开箱即用的保守设置。你需要编辑你的配置文件才能认真使用！我引用手册：
如果你有一个专用的数据库服务器，内存为1GB或更大，那么shared_buffers的合理初始值是系统内存的25%。
因此，编辑您的postgresql.conf文件以增加值并重新加载。
然后再次尝试查询并找出enable_seqscan是如何关闭的。

赞(0）回复(0）举报 2023-11-18

pvabu6sv3#

此查询中

select avg(qs.queryms) 
from querystats qs 
join drilldown d 
  on (qs.transactionid=d.transactionid) 
where querynameid=1;

字符串
你没有使用表“drilldown”中的任何列。由于外键约束保证了“querystats”中的每个“transactionid”在“drilldown”中都有一行，我不认为连接会做任何有用的事情。除非我错过了什么，否则你的查询相当于

select avg(qs.queryms) 
from querystats qs 
where querynameid=1;

型
根本没有join。只要在“querynameid”上有一个索引，你应该会得到不错的性能。

赞(0）回复(0）举报 2023-11-18

tyu7yeag4#

当你不加入时，avg(qs.queryms)只执行一次。
在执行联接时，执行avg(qs.queryms)的次数与联接生成的行的次数相同。
如果您总是对单个querynameid感兴趣，请尝试将avg(qs.queryms)放入子选择中：

SELECT 
    (SELECT avg(queryms) FROM querystats WHERE querynameid=1) 
FROM querystats qs 
JOIN drilldown d ON (qs.transactionid=d.transactionid) 
WHERE qs.querynameid=1;

字符串

赞(0）回复(0）举报 2023-11-18

pxyaymoc5#

querystats表对我来说看起来像一个 fat junction表。在这种情况下：省略代理键，并使用自然（复合）键（两个组件已经不是NULLable）并添加一个 reversed 复合索引。（单独的索引是无用的，FK约束会自动为您生成它们）

-- CREATE SEQUENCE querystats_id_seq;

CREATE TABLE querystats (
    -- id bigint NOT NULL DEFAULT nextval('querystats_id_seq'),
    transactionid bigint NOT NULL default 0 REFERENCES drilldown(transactionid),
    querynameid bigint NOT NULL default 0 REFERENCES queryname(id),
    queryms bigint NOT NULL default 0,
    PRIMARY KEY(transactionid,querynameid )
);

-- ALTER SEQUENCE querystats_id_seq OWNED BY querystats.id;

--CREATE INDEX querystats_idx1 ON querystats (transactionid);
-- CREATE INDEX querystats_idx2 ON querystats (querynameid);
CREATE UNIQUE INDEX querystats_alt ON querystats (querynameid, transactionid);

字符串

赞(0）回复(0）举报 2023-11-18

我来回答

PostgreSQL Simple JOIN非常慢

编辑

5条答案

相关问题

热门标签

最新问答