PostgreSQL将同一个表连接两次，一次用于获取数据，另一次用于过滤

这是我的表结构：

PERSON (4.1 M rows)
id : UUID
notes: string
...
---

ATTRIBUTES (21 M rows)
id:             UUID
person_id:      <persons id> non null
name:           ENUM : full_name | birthday | email | phone | photo
value:          JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
edited_value:   JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
...

---

1 Person  -> N Attributes

因此，现在查询响应应该如下所示：
| 人员标识|注解|属性|
| - -|- -|- -|
| 唯一标识符|一些东西|[{a1}、{a2}...（所有属性）]|
| 唯一标识符|一些东西|[{a1}、{a2}...（所有属性）]|
问题是我需要根据特定条件对列表进行排序和筛选。

基于ATTRIBUTE->edited_value排序||特定ATTRIBUTE->name的ATTRIBUTE->value
如按名字排序（因此按ATTRIBUTES中的行排序，其中name=first_name）
按Attributes -> name筛选
比如GET所有有生日的人（但是输出的attributes列应该有所有属性）

我目前的查询如下所示：
1.获取具有birthday的所有人员并按其first_name排序
我现在拥有的查询看起来像这样，但它需要5秒钟在本地执行

SELECT
    p.id,
    JSONB_AGG(a.*) as attributes,
FROM
    person p
LEFT JOIN
    "attribute" a ON a.person_id = p.id
                  AND a.deleted is false
LEFT JOIN 
    "attribute" ba ON ba.name = 'birthday'
                   AND ba.person_id = p.id
                   AND ba.deleted is false
LEFT JOIN 
    "attribute" fa ON fa.name = 'full_name'
                   AND fa.person_id = p.id
                   AND fa.deleted is false
WHERE
    p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
    AND ba.id IS NOT NULL
    AND p.deleted_at IS NULL
GROUP BY
    p.id
ORDER BY
    LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST

除了attribute.deleted之外，我对where子句上几乎所有的列都有一个索引。
从解释计划来看，排序似乎要花很多时间

Sort  (cost=46184.33..46185.19 rows=344 width=80) (actual time=284.351..284.604 rows=115 loops=1)                                                                                           |
  Sort Key: (lower(max((COALESCE(fa.edited_value, fa.value) ->> 'first_name'::text))))                                                                                                      |
  Sort Method: quicksort  Memory: 1750kB                                                                                                                                                    |
  Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                            |
  ->  GroupAggregate  (cost=46159.52..46169.84 rows=344 width=80) (actual time=258.759..283.345 rows=115 loops=1)                                                                           |
        Group Key: p.id                                                                                                                                                                     |
        Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                      |
        ->  Sort  (cost=46159.52..46160.38 rows=344 width=116) (actual time=258.709..261.751 rows=15967 loops=1)                                                                            |
              Sort Key: p.id                                                                                                                                                                |
              Sort Method: external merge  Disk: 2512kB                                                                                                                                     |
              Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                |
              ->  Gather  (cost=3553.65..46145.03 rows=344 width=116) (actual time=35.024..243.703 rows=15967 loops=1)                                                                      |
                    Workers Planned: 2                                                                                                                                                      |
                    Workers Launched: 2                                                                                                                                                     |
                    Buffers: shared hit=7685 read=29921                                                                                                                                     |
                    ->  Nested Loop Left Join  (cost=2553.65..45110.63 rows=143 width=116) (actual time=19.666..108.066 rows=5322 loops=3)                                                  |
                          Buffers: shared hit=7685 read=29921                                                                                                                               |
                          ->  Nested Loop Left Join  (cost=2553.23..41298.71 rows=52 width=80) (actual time=18.861..100.495 rows=38 loops=3)                                                |
                                Buffers: shared hit=6341 read=29921                                                                                                                         |
                                ->  Parallel Hash Join  (cost=2552.80..40360.76 rows=52 width=16) (actual time=18.794..99.999 rows=38 loops=3)                                              |
                                      Hash Cond: (ba.contact_id = p.id)                                                                                                                     |
                                      Buffers: shared hit=5879 read=29921                                                                                                                   |
                                      ->  Parallel Seq Scan on attribute ba  (cost=0.00..37806.51 rows=554 width=16) (actual time=0.624..94.651 rows=1072 loops=3)                          |
                                            Filter: ((deleted IS FALSE) AND (id IS NOT NULL) AND (name = 'birthday'::attribute_name))                                                       |
                                            Rows Removed by Filter: 177008                                                                                                                  |
                                            Buffers: shared hit=5103 read=29921                                                                                                             |
                                      ->  Parallel Hash  (cost=2486.66..2486.66 rows=5291 width=16) (actual time=3.893..3.894 rows=776 loops=3)                                             |
                                            Buckets: 16384  Batches: 1  Memory Usage: 256kB                                                                                                 |
                                            Buffers: shared hit=692                                                                                                                         |
                                            ->  Parallel Bitmap Heap Scan on contact p  (cost=504.00..2486.66 rows=5291 width=16) (actual time=1.132..10.831 rows=2329 loops=1)             |
                                                  Recheck Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid)                                                                    |
                                                  Filter: (deleted_at IS NULL)                                                                                                              |
                                                  Rows Removed by Filter: 9180                                                                                                              |
                                                  Heap Blocks: exact=568                                                                                                                    |
                                                  Buffers: shared hit=692                                                                                                                   |
                                                  ->  Bitmap Index Scan on contact_user_id_index  (cost=0.00..501.75 rows=11378 width=0) (actual time=1.014..1.014 rows=11526 loops=1)      |
                                                        Index Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid)                                                                |
                                                        Buffers: shared hit=120                                                                                                             |
                                ->  Index Scan using attribute_contact_id_name_deleted_index on attribute fa  (cost=0.42..17.97 rows=7 width=80) (actual time=0.011..0.011 rows=1 loops=115)|
                                      Index Cond: ((contact_id = p.id) AND (name = 'full_name'::attribute_name) AND (deleted = false))                                                      |
                                      Buffers: shared hit=462                                                                                                                               |
                          ->  Index Scan using attribute_contact_id_name_deleted_index on attribute a  (cost=0.42..59.95 rows=1336 width=52) (actual time=0.004..0.059 rows=139 loops=115)  |
                                Index Cond: ((contact_id = p.id) AND (deleted = false))                                                                                                     |
                                Buffers: shared hit=1344                                                                                                                                    |
Planning Time: 0.467 ms                                                                                                                                                                     |
Execution Time: 285.484 ms                                                                                                                                                                  |

你们能给予我一些关于我能做的改进的建议吗？
谢谢
编辑1：使用分析、缓冲区更新了说明编辑2：更新查询以删除歧义

如果你不把属性聚合到JSON数组中（例如jsonb[]），而是聚合到一个常规的JSON值中，你可以直接测试birthday属性是否存在。

SELECT p.id,
       a.attributes
FROM person p
  LEFT JOIN ( 
    select person_id, 
           jsonb_object_agg(att.name, att.canonical_value)) as attributes,
           jsonb_object_agg(att.name, MAX(COALESCE(fa.edited_value, fa.value))) as attributes,
    from attribute att
    where att.deleted = false
    group by att.person_id 
  ) a ON a.person_id = p.id
  LEFT JOIN 
      "attribute" fa ON fa.name = 'full_name'
                     AND fa.person_id = p.id
                     AND fa.deleted is false
WHERE p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
  AND a.attributes ? 'birthday'
  AND p.deleted_at IS NULL
ORDER BY LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST

我发现canonical_value、value和edited_value有点混乱。
但是，也许您可以删除full_name的左连接，方法是在聚合中创建另一个包含正确first_name的JSONB对象，然后按顺序引用该对象。

SELECT p.id,
       a.attributes
FROM person p
  LEFT JOIN ( 
    select att.person_id, 
           jsonb_object_agg(att.name, att.canonical_value) as attributes,
           jsonb_object_agg(att.name, COALESCE(att.edited_value, fa.value)) as real_values
    from attribute att
    where att.deleted = false
    group by att.person_id 
  ) a ON a.person_id = p.id
WHERE p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
  AND attributes ? 'birthday'
  AND p.deleted_at IS NULL
ORDER BY LOWER(real_values ->> 'first_name') ASC NULLS LAST

PostgreSQL将同一个表连接两次，一次用于获取数据，另一次用于过滤

2条答案

相关问题

热门标签

最新问答