我正在设计一个将许多行插入到多个表中的摄取过程。我们在Azure的PostgreSQL Flexible服务器上运行PostgreSQL。目前看来,写入/插入此服务器的速度是增加负载吞吐量的瓶颈(通过一个目标完成率为每分钟100个负载的队列,使用REST API处理JSON负载)。
是否有PostgreSQL / database工具可以用来分析INSERT性能,类似于我们使用EXPLAIN ANALYZE分析SELECT性能的方法?我知道索引和外键会影响INSERT速度。我想每个表的实际列类型和插入字符串的长度也会有影响。
下面是两个较大的表的插入速度测试结果,我们使用Python中的psycopg 2包批量执行_values“INSERT INTO... VALUES”:
资源表,更好的性能和对行计数的敏感度较低。
"Resource Insert: 35 ms, 120 rows, 0.2916666666666667 ms/row" # 1 Slide model
"Resource Insert: 36 ms, 120 rows, 0.3 ms/row"
"Resource Insert: 47 ms, 120 rows, 0.39166666666666666 ms/row"
"Resource Insert: 99 ms, 381 rows, 0.25984251968503935 ms/row" # 10 Slides
"Resource Insert: 97 ms, 381 rows, 0.2545931758530184 ms/row"
"Resource Insert: 117 ms, 381 rows, 0.30708661417322836 ms/row"
"Resource Insert: 654 ms, 2991 rows, 0.21865596790371114 ms/row" # 100 Slides
"Resource Insert: 683 ms, 2991 rows, 0.22835172183216315 ms/row"
"Resource Insert: 665 ms, 2991 rows, 0.2223336676696757 ms/row"
"Resource Insert: 5604 ms, 29091 rows, 0.19263689800969372 ms/row" # 1000 Slides
"Resource Insert: 5498 ms, 29091 rows, 0.1889931593963769 ms/row"
"Resource Insert: 5021 ms, 29091 rows, 0.17259633563645113 ms/row"
"Resource Insert: 4743 ms, 29091 rows, 0.16304011549963907 ms/row"
"Resource Insert: 8428 ms, 29091 rows, 0.2897115946512667 ms/row"
"Resource Insert: 7788 ms, 29091 rows, 0.26771166340105185 ms/row"
"Resource Insert: 7367 ms, 29091 rows, 0.25323983362551994 ms/row"
资源值表,插入性能较差,行数越多扩展性越差
"Resource Val Insert: 378 ms, 113 rows, 3.3451327433628317 ms/row" ~ 1 Slide + Acc,Pat,Cli,TestOrder,Block
"Resource Val Insert: 365 ms, 113 rows, 3.230088495575221 ms/row"
"Resource Val Insert: 356 ms, 113 rows, 3.150442477876106 ms/row"
"Resource Val Insert: 422 ms, 113 rows, 3.734513274336283 ms/row"
"Resource Val Insert: 439 ms, 113 rows, 3.8849557522123894 ms/row"
"Resource Val Insert: 354 ms, 113 rows, 3.1327433628318584 ms/row"
"Resource Val Insert: 1498 ms, 365 rows, 4.104109589041096 ms/row" # 10 Slides
"Resource Val Insert: 1509 ms, 365 rows, 4.134246575342465 ms/row"
"Resource Val Insert: 1568 ms, 365 rows, 4.295890410958904 ms/row"
"Resource Val Insert: 1607 ms, 365 rows, 4.402739726027397 ms/row"
"Resource Val Insert: 1553 ms, 365 rows, 4.254794520547946 ms/row"
"Resource Val Insert: 14187 ms, 2885 rows, 4.917504332755633 ms/row" # 100 Slides
"Resource Val Insert: 14239 ms, 2885 rows, 4.935528596187175 ms/row"
"Resource Val Insert: 13494 ms, 2885 rows, 4.677296360485268 ms/row"
"Resource Val Insert: 14489 ms, 2885 rows, 5.0221837088388215 ms/row"
"Resource Val Insert: 14044 ms, 2885 rows, 4.867937608318891 ms/row"
"Resource Val Insert: 131665 ms, 28085 rows, 4.688089727612605 ms/row" # 1000 Slides
"Resource Val Insert: 133214 ms, 28085 rows, 4.743243724408047 ms/row"
"Resource Val Insert: 132070 ms, 28085 rows, 4.7025102367812 ms/row"
下面是相关表的DDL,这可能是一个因素:平均性能插入-资源表
Table "public.resource"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------------+-----------------------------+-----------+----------+--------------+----------+--------------+-------------
resource_id | integer | | not null | | plain | |
uuid | uuid | | not null | | plain | |
name | character varying | | | | extended | |
url | character varying | | | | extended | |
desc | character varying | | | | extended | |
barcode | character varying | | | | extended | |
barcode_type | character varying | | | | extended | |
meta | jsonb | | | | extended | |
cls | integer | | not null | | plain | |
archived | boolean | | not null | | plain | |
view_template | character varying | | not null | | extended | |
created_timestamp | timestamp without time zone | | not null | | plain | |
updated_timestamp | timestamp without time zone | | not null | | plain | |
owner_resource_id | integer | | | | plain | |
tenant | character varying | | not null | CURRENT_USER | extended | |
Indexes:
"resource_pk" PRIMARY KEY, btree (resource_id, tenant)
"_uuid_tenant_uc" UNIQUE CONSTRAINT, btree (uuid, tenant)
"param_group_unique_name" UNIQUE, btree (name) WHERE cls = 600
"resource_resource_id_key" UNIQUE CONSTRAINT, btree (resource_id)
"ix_resource_barcode" btree (barcode)
"ix_resource_desc" btree ("desc")
"ix_resource_name" btree (name)
"ix_resource_uuid" btree (uuid)
"lab7_role_unique_name" btree (lower(name::text)) WHERE cls = 221 OR cls = 222 OR cls = 220
"resource_created_timestamp_idx" btree (created_timestamp DESC)
"resource_updated_timestamp_idx" btree (updated_timestamp DESC)
Check constraints:
"resource_check" CHECK (resource_id <> owner_resource_id)
Foreign-key constraints:
"resource_owner_resource_id_fkey" FOREIGN KEY (owner_resource_id) REFERENCES resource(resource_id)
Referenced by:
TABLE "address" CONSTRAINT "address_address_id_fkey" FOREIGN KEY (address_id) REFERENCES resource(resource_id)
# .... and many more references! This is our most basic table that acts sort of like an abstract base class
性能不佳的表格插入
\d+ resource_val
Table "public.resource_val"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------------------+-------------------+-----------+----------+--------------+----------+--------------+-------------
resource_val_id | integer | | not null | | plain | |
resource_var_id | integer | | | | plain | |
value_group_id | integer | | | | plain | |
sort_id | integer | | | | plain | |
value | character varying | | | | extended | |
expression | character varying | | | | extended | |
dropdown | jsonb | | | | extended | |
bound_resource_id | integer | | | | plain | |
error_msg | character varying | | | | extended | |
dropdown_error_msg | character varying | | | | extended | |
tenant | character varying | | not null | CURRENT_USER | extended | |
step_instance_sample_id | integer | | | | plain | |
Indexes:
"resource_val_pkey" PRIMARY KEY, btree (resource_val_id)
"idx_res_val_hash" hash (value)
"ix_r_val_bound_resource_id" btree (bound_resource_id)
"ix_r_val_sis_id" btree (step_instance_sample_id)
Foreign-key constraints:
"resource_val_bound_resource_id_fkey" FOREIGN KEY (bound_resource_id) REFERENCES resource(resource_id)
"resource_val_resource_val_id_fkey" FOREIGN KEY (resource_val_id) REFERENCES resource(resource_id)
"resource_val_resource_var_id_fkey" FOREIGN KEY (resource_var_id) REFERENCES resource_var(resource_var_id)
"resource_val_step_instance_sample_id_fkey" FOREIGN KEY (step_instance_sample_id) REFERENCES step_instance_sample(association_id)
"resource_val_value_group_id_fkey" FOREIGN KEY (value_group_id) REFERENCES resource_val_group(id)
1条答案
按热度按时间6rqinv9w1#
EXPLAIN (ANALYZE, BUFFERS)
确实适用于INSERT,虽然它不像SELECT那样有用,但它仍然是您最好的起点,它至少可以单独分解外键约束的计时,尽管它不能分解索引维护时间。你的“可伸缩性差”对我来说并不正确。性能下降很小,而且不一致。例如,“100张幻灯片”的大多数时间都比“1000张幻灯片”的任何时间都要差。假设演示文稿是按时间顺序排列的,也许随着时间的推移会发生其他事情,比如从云提供商那里用完“学分”,或者操作系统缓存被脏页堵塞,或者仅仅是索引增长到不能再保留在缓存中的程度(但是这种差异会比您看到的要大得多)。为了让人信服,您需要对更多的索引运行测试,并且批处理大小是随机的,而不是可预测的顺序。如果差异(批处理大小)是真实的,那么它大到足以让人担心吗?
现在插入的伸缩性很差,但这与插入的表的索引大小有关,而不是插入的批处理的大小。当不良行为开始时,就像从悬崖上掉下来一样,不仅仅是性能下降20%。
通常,研究这类问题的最佳方法(假设只研究
EXPLAIN (ANALYZE, BUFFERS)
还不够)是删除一些约束/索引,然后看看会发生什么。