postgresql 列的非重复值的最新字典

1l5u6lss  于 2023-01-02  发布在  PostgreSQL
关注(0)|答案(1)|浏览(113)

我有一个包含许多列和数百万行的表,如

CREATE TABLE foo (
id integer,
thing1 text,
thing2 text,
...
stuff text);

如何管理stuff列的唯一值字典的相关性,该列最初是这样填充的:

INSERT INTO stuff_dict SELECT DISTINCT stuff from foo;

我应该手动同步(在每次插入/更新之前检查新的stuff值是否已经存在于stuff_dict中)还是使用触发器来插入/更新/删除foo表。在后一种情况下,这种触发器的最佳设计是什么?
UPDATE:view不适合这里,因为SELECT * FROM stuff_dict应该运行得尽可能快(当foo有数千万条记录时,即使CREATE INDEX ON foo(stuff)也帮不上什么忙)。

2ledvvac

2ledvvac1#

对于大型表,示例化视图似乎是最简单的选择。
在触发器功能中刷新视图,你可以使用concurrently选项(见下面的注解)。

create materialized view stuff_dict as 
    select distinct stuff
    from foo;

create or replace function refresh_stuff_dict()
returns trigger language plpgsql
as $$
begin
    refresh materialized view /*concurrently*/ stuff_dict;
    return null;
end $$;

create trigger refresh_stuff_dict
after insert or update or delete or truncate
on foo for each statement 
execute procedure refresh_stuff_dict();

虽然实体化视图的解决方案很简单,但当表foo频繁修改时,它可能不是最佳的。在这种情况下,使用表作为字典。索引会很有帮助。

create table stuff_dict as 
    select distinct stuff
    from foo;

create index on stuff_dict(stuff);

触发器函数更为复杂,应该在插入/更新/删除之后为每一行触发:

create or replace function refresh_stuff_dict()
returns trigger language plpgsql
as $$
declare
    do_insert boolean := tg_op = 'INSERT' or tg_op = 'UPDATE' and new.stuff <> old.stuff;
    do_delete boolean := tg_op = 'DELETE' or tg_op = 'UPDATE' and new.stuff <> old.stuff;
begin
    if do_insert and not exists (select 1 from stuff_dict where stuff = new.stuff) then
        insert into stuff_dict values(new.stuff);
    end if;
    if do_delete and not exists (select 1 from foo where stuff = old.stuff) then
        delete from stuff_dict
        where stuff = old.stuff;
    end if;
    return case tg_op when 'DELETE' then old else new end;
end $$;

create trigger refresh_stuff_dict
after insert or update or delete
on foo for each row
execute procedure refresh_stuff_dict();

相关问题