PostgreSQL将同一个表连接两次,一次用于获取数据,另一次用于过滤

js4nwp54  于 2022-11-29  发布在  PostgreSQL
关注(0)|答案(2)|浏览(196)

这是我的表结构:

PERSON (4.1 M rows)
id : UUID
notes: string
...
---

ATTRIBUTES (21 M rows)
id:             UUID
person_id:      <persons id> non null
name:           ENUM : full_name | birthday | email | phone | photo
value:          JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
edited_value:   JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
...

---

1 Person  -> N Attributes

因此,现在查询响应应该如下所示:
| 人员标识|注解|属性|
| - -|- -|- -|
| 唯一标识符|一些东西|[{a1}、{a2}...(所有属性)]|
| 唯一标识符|一些东西|[{a1}、{a2}...(所有属性)]|
问题是我需要根据特定条件对列表进行排序和筛选。

  • 基于ATTRIBUTE->edited_value排序||特定ATTRIBUTE->nameATTRIBUTE->value
  • 如按名字排序(因此按ATTRIBUTES中的行排序,其中name=first_name)
  • Attributes -> name筛选
  • 比如GET所有有生日的人(但是输出的attributes列应该有所有属性)

我目前的查询如下所示:
1.获取具有birthday的所有人员并按其first_name排序
我现在拥有的查询看起来像这样,但它需要5秒钟在本地执行

SELECT
    p.id,
    JSONB_AGG(a.*) as attributes,
FROM
    person p
LEFT JOIN
    "attribute" a ON a.person_id = p.id
                  AND a.deleted is false
LEFT JOIN 
    "attribute" ba ON ba.name = 'birthday'
                   AND ba.person_id = p.id
                   AND ba.deleted is false
LEFT JOIN 
    "attribute" fa ON fa.name = 'full_name'
                   AND fa.person_id = p.id
                   AND fa.deleted is false
WHERE
    p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
    AND ba.id IS NOT NULL
    AND p.deleted_at IS NULL
GROUP BY
    p.id
ORDER BY
    LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST

除了attribute.deleted之外,我对where子句上几乎所有的列都有一个索引。
从解释计划来看,排序似乎要花很多时间

Sort  (cost=46184.33..46185.19 rows=344 width=80) (actual time=284.351..284.604 rows=115 loops=1)                                                                                           |
  Sort Key: (lower(max((COALESCE(fa.edited_value, fa.value) ->> 'first_name'::text))))                                                                                                      |
  Sort Method: quicksort  Memory: 1750kB                                                                                                                                                    |
  Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                            |
  ->  GroupAggregate  (cost=46159.52..46169.84 rows=344 width=80) (actual time=258.759..283.345 rows=115 loops=1)                                                                           |
        Group Key: p.id                                                                                                                                                                     |
        Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                      |
        ->  Sort  (cost=46159.52..46160.38 rows=344 width=116) (actual time=258.709..261.751 rows=15967 loops=1)                                                                            |
              Sort Key: p.id                                                                                                                                                                |
              Sort Method: external merge  Disk: 2512kB                                                                                                                                     |
              Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                |
              ->  Gather  (cost=3553.65..46145.03 rows=344 width=116) (actual time=35.024..243.703 rows=15967 loops=1)                                                                      |
                    Workers Planned: 2                                                                                                                                                      |
                    Workers Launched: 2                                                                                                                                                     |
                    Buffers: shared hit=7685 read=29921                                                                                                                                     |
                    ->  Nested Loop Left Join  (cost=2553.65..45110.63 rows=143 width=116) (actual time=19.666..108.066 rows=5322 loops=3)                                                  |
                          Buffers: shared hit=7685 read=29921                                                                                                                               |
                          ->  Nested Loop Left Join  (cost=2553.23..41298.71 rows=52 width=80) (actual time=18.861..100.495 rows=38 loops=3)                                                |
                                Buffers: shared hit=6341 read=29921                                                                                                                         |
                                ->  Parallel Hash Join  (cost=2552.80..40360.76 rows=52 width=16) (actual time=18.794..99.999 rows=38 loops=3)                                              |
                                      Hash Cond: (ba.contact_id = p.id)                                                                                                                     |
                                      Buffers: shared hit=5879 read=29921                                                                                                                   |
                                      ->  Parallel Seq Scan on attribute ba  (cost=0.00..37806.51 rows=554 width=16) (actual time=0.624..94.651 rows=1072 loops=3)                          |
                                            Filter: ((deleted IS FALSE) AND (id IS NOT NULL) AND (name = 'birthday'::attribute_name))                                                       |
                                            Rows Removed by Filter: 177008                                                                                                                  |
                                            Buffers: shared hit=5103 read=29921                                                                                                             |
                                      ->  Parallel Hash  (cost=2486.66..2486.66 rows=5291 width=16) (actual time=3.893..3.894 rows=776 loops=3)                                             |
                                            Buckets: 16384  Batches: 1  Memory Usage: 256kB                                                                                                 |
                                            Buffers: shared hit=692                                                                                                                         |
                                            ->  Parallel Bitmap Heap Scan on contact p  (cost=504.00..2486.66 rows=5291 width=16) (actual time=1.132..10.831 rows=2329 loops=1)             |
                                                  Recheck Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid)                                                                    |
                                                  Filter: (deleted_at IS NULL)                                                                                                              |
                                                  Rows Removed by Filter: 9180                                                                                                              |
                                                  Heap Blocks: exact=568                                                                                                                    |
                                                  Buffers: shared hit=692                                                                                                                   |
                                                  ->  Bitmap Index Scan on contact_user_id_index  (cost=0.00..501.75 rows=11378 width=0) (actual time=1.014..1.014 rows=11526 loops=1)      |
                                                        Index Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid)                                                                |
                                                        Buffers: shared hit=120                                                                                                             |
                                ->  Index Scan using attribute_contact_id_name_deleted_index on attribute fa  (cost=0.42..17.97 rows=7 width=80) (actual time=0.011..0.011 rows=1 loops=115)|
                                      Index Cond: ((contact_id = p.id) AND (name = 'full_name'::attribute_name) AND (deleted = false))                                                      |
                                      Buffers: shared hit=462                                                                                                                               |
                          ->  Index Scan using attribute_contact_id_name_deleted_index on attribute a  (cost=0.42..59.95 rows=1336 width=52) (actual time=0.004..0.059 rows=139 loops=115)  |
                                Index Cond: ((contact_id = p.id) AND (deleted = false))                                                                                                     |
                                Buffers: shared hit=1344                                                                                                                                    |
Planning Time: 0.467 ms                                                                                                                                                                     |
Execution Time: 285.484 ms                                                                                                                                                                  |

你们能给予我一些关于我能做的改进的建议吗?
谢谢
编辑1:使用分析、缓冲区更新了说明编辑2:更新查询以删除歧义

1bqhqjot

1bqhqjot1#

如果你不把属性聚合到JSON数组中(例如jsonb[]),而是聚合到一个常规的JSON值中,你可以直接测试birthday属性是否存在。

SELECT p.id,
       a.attributes
FROM person p
  LEFT JOIN ( 
    select person_id, 
           jsonb_object_agg(att.name, att.canonical_value)) as attributes,
           jsonb_object_agg(att.name, MAX(COALESCE(fa.edited_value, fa.value))) as attributes,
    from attribute att
    where att.deleted = false
    group by att.person_id 
  ) a ON a.person_id = p.id
  LEFT JOIN 
      "attribute" fa ON fa.name = 'full_name'
                     AND fa.person_id = p.id
                     AND fa.deleted is false
WHERE p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
  AND a.attributes ? 'birthday'
  AND p.deleted_at IS NULL
ORDER BY LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST

我发现canonical_valuevalueedited_value有点混乱。
但是,也许您可以删除full_name的左连接,方法是在聚合中创建另一个包含正确first_name的JSONB对象,然后按顺序引用该对象。

SELECT p.id,
       a.attributes
FROM person p
  LEFT JOIN ( 
    select att.person_id, 
           jsonb_object_agg(att.name, att.canonical_value) as attributes,
           jsonb_object_agg(att.name, COALESCE(att.edited_value, fa.value)) as real_values
    from attribute att
    where att.deleted = false
    group by att.person_id 
  ) a ON a.person_id = p.id
WHERE p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
  AND attributes ? 'birthday'
  AND p.deleted_at IS NULL
ORDER BY LOWER(real_values ->> 'first_name') ASC NULLS LAST
1l5u6lss

1l5u6lss2#

您的数据模型不是很好,而且在问题的过程中,您的表名和列名不断变化的方式非常令人困惑。
但看起来您确实需要一个以“name”开头的“attribute”的索引,或者可能需要一个以WHERE deleted is false开头的部分索引。
这应该有助于改进您向我们展示的计划的查询的执行,但是由于这只花费了0.3秒,而不是5秒,所以很难知道它会对看不见的5秒查询计划做什么。

相关问题