PostgreSQL和用于分析频繁大型插入的性能成本的工具

smdncfj3  于 2023-03-08  发布在  PostgreSQL
关注(0)|答案(1)|浏览(210)

我正在设计一个将许多行插入到多个表中的摄取过程。我们在Azure的PostgreSQL Flexible服务器上运行PostgreSQL。目前看来,写入/插入此服务器的速度是增加负载吞吐量的瓶颈(通过一个目标完成率为每分钟100个负载的队列,使用REST API处理JSON负载)。
是否有PostgreSQL / database工具可以用来分析INSERT性能,类似于我们使用EXPLAIN ANALYZE分析SELECT性能的方法?我知道索引和外键会影响INSERT速度。我想每个表的实际列类型和插入字符串的长度也会有影响。
下面是两个较大的表的插入速度测试结果,我们使用Python中的psycopg 2包批量执行_values“INSERT INTO... VALUES”:
资源表,更好的性能和对行计数的敏感度较低。

"Resource Insert: 35 ms, 120 rows, 0.2916666666666667 ms/row" # 1 Slide model
"Resource Insert: 36 ms, 120 rows, 0.3 ms/row"
"Resource Insert: 47 ms, 120 rows, 0.39166666666666666 ms/row"
"Resource Insert: 99 ms, 381 rows, 0.25984251968503935 ms/row" # 10 Slides
"Resource Insert: 97 ms, 381 rows, 0.2545931758530184 ms/row"
"Resource Insert: 117 ms, 381 rows, 0.30708661417322836 ms/row"
"Resource Insert: 654 ms, 2991 rows, 0.21865596790371114 ms/row" # 100 Slides
"Resource Insert: 683 ms, 2991 rows, 0.22835172183216315 ms/row"
"Resource Insert: 665 ms, 2991 rows, 0.2223336676696757 ms/row"
"Resource Insert: 5604 ms, 29091 rows, 0.19263689800969372 ms/row" # 1000 Slides
"Resource Insert: 5498 ms, 29091 rows, 0.1889931593963769 ms/row"
"Resource Insert: 5021 ms, 29091 rows, 0.17259633563645113 ms/row"
"Resource Insert: 4743 ms, 29091 rows, 0.16304011549963907 ms/row"
"Resource Insert: 8428 ms, 29091 rows, 0.2897115946512667 ms/row"
"Resource Insert: 7788 ms, 29091 rows, 0.26771166340105185 ms/row"
"Resource Insert: 7367 ms, 29091 rows, 0.25323983362551994 ms/row"

资源值表,插入性能较差,行数越多扩展性越差

"Resource Val Insert: 378 ms, 113 rows, 3.3451327433628317 ms/row" ~ 1 Slide + Acc,Pat,Cli,TestOrder,Block
"Resource Val Insert: 365 ms, 113 rows, 3.230088495575221 ms/row" 
"Resource Val Insert: 356 ms, 113 rows, 3.150442477876106 ms/row"
"Resource Val Insert: 422 ms, 113 rows, 3.734513274336283 ms/row"
"Resource Val Insert: 439 ms, 113 rows, 3.8849557522123894 ms/row"
"Resource Val Insert: 354 ms, 113 rows, 3.1327433628318584 ms/row"
"Resource Val Insert: 1498 ms, 365 rows, 4.104109589041096 ms/row" # 10 Slides
"Resource Val Insert: 1509 ms, 365 rows, 4.134246575342465 ms/row"
"Resource Val Insert: 1568 ms, 365 rows, 4.295890410958904 ms/row"
"Resource Val Insert: 1607 ms, 365 rows, 4.402739726027397 ms/row"
"Resource Val Insert: 1553 ms, 365 rows, 4.254794520547946 ms/row"
"Resource Val Insert: 14187 ms, 2885 rows, 4.917504332755633 ms/row" # 100 Slides
"Resource Val Insert: 14239 ms, 2885 rows, 4.935528596187175 ms/row"
"Resource Val Insert: 13494 ms, 2885 rows, 4.677296360485268 ms/row"
"Resource Val Insert: 14489 ms, 2885 rows, 5.0221837088388215 ms/row"
"Resource Val Insert: 14044 ms, 2885 rows, 4.867937608318891 ms/row"
"Resource Val Insert: 131665 ms, 28085 rows, 4.688089727612605 ms/row" # 1000 Slides
"Resource Val Insert: 133214 ms, 28085 rows, 4.743243724408047 ms/row"
"Resource Val Insert: 132070 ms, 28085 rows, 4.7025102367812 ms/row"

下面是相关表的DDL,这可能是一个因素:平均性能插入-资源表

Table "public.resource"
      Column       |            Type             | Collation | Nullable |   Default    | Storage  | Stats target | Description
-------------------+-----------------------------+-----------+----------+--------------+----------+--------------+-------------
 resource_id       | integer                     |           | not null |              | plain    |              |
 uuid              | uuid                        |           | not null |              | plain    |              |
 name              | character varying           |           |          |              | extended |              |
 url               | character varying           |           |          |              | extended |              |
 desc              | character varying           |           |          |              | extended |              |
 barcode           | character varying           |           |          |              | extended |              |
 barcode_type      | character varying           |           |          |              | extended |              |
 meta              | jsonb                       |           |          |              | extended |              |
 cls               | integer                     |           | not null |              | plain    |              |
 archived          | boolean                     |           | not null |              | plain    |              |
 view_template     | character varying           |           | not null |              | extended |              |
 created_timestamp | timestamp without time zone |           | not null |              | plain    |              |
 updated_timestamp | timestamp without time zone |           | not null |              | plain    |              |
 owner_resource_id | integer                     |           |          |              | plain    |              |
 tenant            | character varying           |           | not null | CURRENT_USER | extended |              |
Indexes:
    "resource_pk" PRIMARY KEY, btree (resource_id, tenant)
    "_uuid_tenant_uc" UNIQUE CONSTRAINT, btree (uuid, tenant)
    "param_group_unique_name" UNIQUE, btree (name) WHERE cls = 600
    "resource_resource_id_key" UNIQUE CONSTRAINT, btree (resource_id)
    "ix_resource_barcode" btree (barcode)
    "ix_resource_desc" btree ("desc")
    "ix_resource_name" btree (name)
    "ix_resource_uuid" btree (uuid)
    "lab7_role_unique_name" btree (lower(name::text)) WHERE cls = 221 OR cls = 222 OR cls = 220
    "resource_created_timestamp_idx" btree (created_timestamp DESC)
    "resource_updated_timestamp_idx" btree (updated_timestamp DESC)
Check constraints:
    "resource_check" CHECK (resource_id <> owner_resource_id)
Foreign-key constraints:
    "resource_owner_resource_id_fkey" FOREIGN KEY (owner_resource_id) REFERENCES resource(resource_id)
Referenced by:
    TABLE "address" CONSTRAINT "address_address_id_fkey" FOREIGN KEY (address_id) REFERENCES resource(resource_id)
    # .... and many more references! This is our most basic table that acts sort of like an abstract base class

性能不佳的表格插入

\d+ resource_val
                                                Table "public.resource_val"
         Column          |       Type        | Collation | Nullable |   Default    | Storage  | Stats target | Description
-------------------------+-------------------+-----------+----------+--------------+----------+--------------+-------------
 resource_val_id         | integer           |           | not null |              | plain    |              |
 resource_var_id         | integer           |           |          |              | plain    |              |
 value_group_id          | integer           |           |          |              | plain    |              |
 sort_id                 | integer           |           |          |              | plain    |              |
 value                   | character varying |           |          |              | extended |              |
 expression              | character varying |           |          |              | extended |              |
 dropdown                | jsonb             |           |          |              | extended |              |
 bound_resource_id       | integer           |           |          |              | plain    |              |
 error_msg               | character varying |           |          |              | extended |              |
 dropdown_error_msg      | character varying |           |          |              | extended |              |
 tenant                  | character varying |           | not null | CURRENT_USER | extended |              |
 step_instance_sample_id | integer           |           |          |              | plain    |              |
Indexes:
    "resource_val_pkey" PRIMARY KEY, btree (resource_val_id)
    "idx_res_val_hash" hash (value)
    "ix_r_val_bound_resource_id" btree (bound_resource_id)
    "ix_r_val_sis_id" btree (step_instance_sample_id)
Foreign-key constraints:
    "resource_val_bound_resource_id_fkey" FOREIGN KEY (bound_resource_id) REFERENCES resource(resource_id)
    "resource_val_resource_val_id_fkey" FOREIGN KEY (resource_val_id) REFERENCES resource(resource_id)
    "resource_val_resource_var_id_fkey" FOREIGN KEY (resource_var_id) REFERENCES resource_var(resource_var_id)
    "resource_val_step_instance_sample_id_fkey" FOREIGN KEY (step_instance_sample_id) REFERENCES step_instance_sample(association_id)
    "resource_val_value_group_id_fkey" FOREIGN KEY (value_group_id) REFERENCES resource_val_group(id)
6rqinv9w

6rqinv9w1#

EXPLAIN (ANALYZE, BUFFERS)确实适用于INSERT,虽然它不像SELECT那样有用,但它仍然是您最好的起点,它至少可以单独分解外键约束的计时,尽管它不能分解索引维护时间。
你的“可伸缩性差”对我来说并不正确。性能下降很小,而且不一致。例如,“100张幻灯片”的大多数时间都比“1000张幻灯片”的任何时间都要差。假设演示文稿是按时间顺序排列的,也许随着时间的推移会发生其他事情,比如从云提供商那里用完“学分”,或者操作系统缓存被脏页堵塞,或者仅仅是索引增长到不能再保留在缓存中的程度(但是这种差异会比您看到的要大得多)。为了让人信服,您需要对更多的索引运行测试,并且批处理大小是随机的,而不是可预测的顺序。如果差异(批处理大小)是真实的,那么它大到足以让人担心吗?
现在插入的伸缩性很差,但这与插入的表的索引大小有关,而不是插入的批处理的大小。当不良行为开始时,就像从悬崖上掉下来一样,不仅仅是性能下降20%。
通常,研究这类问题的最佳方法(假设只研究EXPLAIN (ANALYZE, BUFFERS)还不够)是删除一些约束/索引,然后看看会发生什么。

相关问题