这是我的表结构:
PERSON (4.1 M rows)
id : UUID
notes: string
...
---
ATTRIBUTES (21 M rows)
id: UUID
person_id: <persons id> non null
name: ENUM : full_name | birthday | email | phone | photo
value: JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
edited_value: JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
...
---
1 Person -> N Attributes
因此,现在查询响应应该如下所示:
| 人员标识|注解|属性|
| - -|- -|- -|
| 唯一标识符|一些东西|[{a1}、{a2}...(所有属性)]|
| 唯一标识符|一些东西|[{a1}、{a2}...(所有属性)]|
问题是我需要根据特定条件对列表进行排序和筛选。
- 基于
ATTRIBUTE->edited_value
排序||特定ATTRIBUTE->name
的ATTRIBUTE->value
- 如按名字排序(因此按ATTRIBUTES中的行排序,其中name=first_name)
- 按
Attributes -> name
筛选 - 比如GET所有有生日的人(但是输出的
attributes
列应该有所有属性)
我目前的查询如下所示:
1.获取具有birthday
的所有人员并按其first_name
排序
我现在拥有的查询看起来像这样,但它需要5秒钟在本地执行
SELECT
p.id,
JSONB_AGG(a.*) as attributes,
FROM
person p
LEFT JOIN
"attribute" a ON a.person_id = p.id
AND a.deleted is false
LEFT JOIN
"attribute" ba ON ba.name = 'birthday'
AND ba.person_id = p.id
AND ba.deleted is false
LEFT JOIN
"attribute" fa ON fa.name = 'full_name'
AND fa.person_id = p.id
AND fa.deleted is false
WHERE
p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
AND ba.id IS NOT NULL
AND p.deleted_at IS NULL
GROUP BY
p.id
ORDER BY
LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST
除了attribute.deleted
之外,我对where子句上几乎所有的列都有一个索引。
从解释计划来看,排序似乎要花很多时间
Sort (cost=46184.33..46185.19 rows=344 width=80) (actual time=284.351..284.604 rows=115 loops=1) |
Sort Key: (lower(max((COALESCE(fa.edited_value, fa.value) ->> 'first_name'::text)))) |
Sort Method: quicksort Memory: 1750kB |
Buffers: shared hit=7685 read=29921, temp read=314 written=315 |
-> GroupAggregate (cost=46159.52..46169.84 rows=344 width=80) (actual time=258.759..283.345 rows=115 loops=1) |
Group Key: p.id |
Buffers: shared hit=7685 read=29921, temp read=314 written=315 |
-> Sort (cost=46159.52..46160.38 rows=344 width=116) (actual time=258.709..261.751 rows=15967 loops=1) |
Sort Key: p.id |
Sort Method: external merge Disk: 2512kB |
Buffers: shared hit=7685 read=29921, temp read=314 written=315 |
-> Gather (cost=3553.65..46145.03 rows=344 width=116) (actual time=35.024..243.703 rows=15967 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
Buffers: shared hit=7685 read=29921 |
-> Nested Loop Left Join (cost=2553.65..45110.63 rows=143 width=116) (actual time=19.666..108.066 rows=5322 loops=3) |
Buffers: shared hit=7685 read=29921 |
-> Nested Loop Left Join (cost=2553.23..41298.71 rows=52 width=80) (actual time=18.861..100.495 rows=38 loops=3) |
Buffers: shared hit=6341 read=29921 |
-> Parallel Hash Join (cost=2552.80..40360.76 rows=52 width=16) (actual time=18.794..99.999 rows=38 loops=3) |
Hash Cond: (ba.contact_id = p.id) |
Buffers: shared hit=5879 read=29921 |
-> Parallel Seq Scan on attribute ba (cost=0.00..37806.51 rows=554 width=16) (actual time=0.624..94.651 rows=1072 loops=3) |
Filter: ((deleted IS FALSE) AND (id IS NOT NULL) AND (name = 'birthday'::attribute_name)) |
Rows Removed by Filter: 177008 |
Buffers: shared hit=5103 read=29921 |
-> Parallel Hash (cost=2486.66..2486.66 rows=5291 width=16) (actual time=3.893..3.894 rows=776 loops=3) |
Buckets: 16384 Batches: 1 Memory Usage: 256kB |
Buffers: shared hit=692 |
-> Parallel Bitmap Heap Scan on contact p (cost=504.00..2486.66 rows=5291 width=16) (actual time=1.132..10.831 rows=2329 loops=1) |
Recheck Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid) |
Filter: (deleted_at IS NULL) |
Rows Removed by Filter: 9180 |
Heap Blocks: exact=568 |
Buffers: shared hit=692 |
-> Bitmap Index Scan on contact_user_id_index (cost=0.00..501.75 rows=11378 width=0) (actual time=1.014..1.014 rows=11526 loops=1) |
Index Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid) |
Buffers: shared hit=120 |
-> Index Scan using attribute_contact_id_name_deleted_index on attribute fa (cost=0.42..17.97 rows=7 width=80) (actual time=0.011..0.011 rows=1 loops=115)|
Index Cond: ((contact_id = p.id) AND (name = 'full_name'::attribute_name) AND (deleted = false)) |
Buffers: shared hit=462 |
-> Index Scan using attribute_contact_id_name_deleted_index on attribute a (cost=0.42..59.95 rows=1336 width=52) (actual time=0.004..0.059 rows=139 loops=115) |
Index Cond: ((contact_id = p.id) AND (deleted = false)) |
Buffers: shared hit=1344 |
Planning Time: 0.467 ms |
Execution Time: 285.484 ms |
你们能给予我一些关于我能做的改进的建议吗?
谢谢
编辑1:使用分析、缓冲区更新了说明编辑2:更新查询以删除歧义
2条答案
按热度按时间1bqhqjot1#
如果你不把属性聚合到JSON数组中(例如
jsonb[]
),而是聚合到一个常规的JSON值中,你可以直接测试birthday
属性是否存在。我发现
canonical_value
、value
和edited_value
有点混乱。但是,也许您可以删除
full_name
的左连接,方法是在聚合中创建另一个包含正确first_name的JSONB对象,然后按顺序引用该对象。1l5u6lss2#
您的数据模型不是很好,而且在问题的过程中,您的表名和列名不断变化的方式非常令人困惑。
但看起来您确实需要一个以“name”开头的“attribute”的索引,或者可能需要一个以
WHERE deleted is false
开头的部分索引。这应该有助于改进您向我们展示的计划的查询的执行,但是由于这只花费了0.3秒,而不是5秒,所以很难知道它会对看不见的5秒查询计划做什么。