postgresql 联机重新分区现有表

63lcw9qa  于 2023-03-17  发布在  PostgreSQL
关注(0)|答案(1)|浏览(142)

表架构

我有一个按天分区的分区表“solutions”,如下所示:
解决方案:

  • 解决方案_20230115
  • 解决方案_20230116
  • 解决方案_20230117

...

  • 解决方案_20230314
  • 解决方案_20230315
  • 解决方案_旧版

每天都会为新数据创建新分区'solutuons_yyyymmdd'。分区'solutions_legacy'存储表'solutions'分区之前的时间段内的旧数据,当前为从MINVALUE到'2023-01- 15'。
我的目标
我必须为旧数据创建新分区,并将数据从分区“solutions_legacy”迁移到新创建的分区。
我的困难
问题是我必须在线完成,以便在迁移数据时最终用户可以使用表“解决方案”(至少是最近的分区)。

我的解决方案

这个想法是将'solutions_legacy'从'solutions'中分离出来,并将其作为一个单独的表使用,而'solutions'则可供最终用户使用。

  • 分离“解决方案_旧版”
  • 创建新表'solutions_yyyymmdd'
  • 将相应的数据从“解决方案_旧版”迁移到“解决方案_yyyymmdd”
  • 将两个表作为分区附加到“解决方案”

这样,我想只有在第一步分离“solutions_legacy”时才会有一些停机时间。

我的代码

create or replace PROCEDURE attach_partition(p_name text, p_start date default '2020-01-01', p_end date default '2099-01-01') 
as $$
begin
  RAISE NOTICE '%: Attaching partition ''%''...', clock_timestamp(), p_name;
  execute format(
    'ALTER TABLE solutions ATTACH PARTITION %I
    FOR VALUES FROM (%L) TO (%L)', p_name, p_start, p_end);
  RAISE NOTICE '% ''%'' has been attached', clock_timestamp(), p_name;
end;
$$ LANGUAGE plpgsql;

create or replace PROCEDURE detach_partition(p_name text)
as $$
begin
  RAISE NOTICE '%: Detaching ''%''...', clock_timestamp(), p_name;
  execute format('ALTER TABLE solutions DETACH PARTITION %I', p_name);
  RAISE NOTICE '%: ''%'' has been detached', clock_timestamp(), p_name;
end;
$$ LANGUAGE plpgsql;

create or replace PROCEDURE create_table(p_name text, p_start date, p_end date)
as $$
begin
  RAISE NOTICE '%: Creating new table ''%'' for values from % to %...', clock_timestamp(), p_name, p_start, p_end;
  execute format('CREATE TABLE IF NOT EXISTS %I
    (LIKE solutions INCLUDING DEFAULTS)', p_name);
  RAISE NOTICE '%: Table ''%'' has been created', clock_timestamp(), p_name;
end;
$$ LANGUAGE plpgsql;

create or replace PROCEDURE migrate_data(p_name text, p_start date, p_end date)
as $$
begin
  RAISE NOTICE '%: Migrating data to ''%''...', clock_timestamp(), p_name;
  execute format('
    WITH moved_rows AS (
      DELETE FROM public.solutions_legacy sl
      WHERE created_at >= %L
        and created_at < %L
      RETURNING sl.*
    )
    INSERT INTO %I
    SELECT * FROM moved_rows', p_start, p_end, p_name);
  RAISE NOTICE '%: Data has been migrated', clock_timestamp();
end;
$$ LANGUAGE plpgsql;

do
$$
declare 
  new_partitions_count int := 3;
  rec record;
  partition_start date;
  partition_end   date;
  partition_name  text;
begin
  call detach_partition('solutions_legacy');
  -- !!! HERE IS WHERE I TRY TO COMMIT
  RAISE NOTICE '%: Querying ''solutions_legacy'' to determine % new partitions...', clock_timestamp(), new_partitions_count;
  for rec in 
    select distinct date_trunc('day', created_at)::date d 
    from public.solutions_legacy 
    order by d desc limit new_partitions_count
  loop
    partition_start := rec.d;
    partition_end   := rec.d + 1;
    partition_name  := 'solutions_' || TO_CHAR(partition_start, 'yyyymmdd') ;
    call create_table(partition_name, partition_start, partition_end);
    call migrate_data(partition_name, partition_start, partition_end);
    call attach_partition(partition_name, partition_start, partition_end);
  end loop;
  call attach_partition('solutions_legacy', p_end => partition_start);
end;
$$ LANGUAGE plpgsql;

下面是一个日志生成器示例,以便更好地理解:

NOTICE:  2023-03-15 10:28:02.306654+00: Detaching 'solutions_legacy'...
NOTICE:  2023-03-15 10:31:07.60235+00: 'solutions_legacy' has been detached
NOTICE:  2023-03-15 10:31:07.602576+00: Querying 'solutions_legacy' to determine 3 new partitions...
NOTICE:  2023-03-15 10:33:29.410183+00: Creating new table 'solutions_20230114' for values from 2023-01-14 to 2023-01-15...
NOTICE:  2023-03-15 10:33:29.416459+00: Table 'solutions_20230114' has been created
NOTICE:  2023-03-15 10:33:29.417196+00: Migrating data to 'solutions_20230114'...
NOTICE:  2023-03-15 10:36:14.731551+00: Data has been migrated
NOTICE:  2023-03-15 10:36:14.732235+00: Attaching partition 'solutions_20230114'...
NOTICE:  2023-03-15 10:37:51.100635+00 'solutions_20230114' has been attached
NOTICE:  2023-03-15 10:37:51.100761+00: Creating new table 'solutions_20230113' for values from 2023-01-13 to 2023-01-14...
NOTICE:  2023-03-15 10:37:51.102272+00: Table 'solutions_20230113' has been created
NOTICE:  2023-03-15 10:37:51.102315+00: Migrating data to 'solutions_20230113'...
NOTICE:  2023-03-15 10:40:10.749074+00: Data has been migrated
NOTICE:  2023-03-15 10:40:10.749182+00: Attaching partition 'solutions_20230113'...
NOTICE:  2023-03-15 10:41:47.559556+00 'solutions_20230113' has been attached
NOTICE:  2023-03-15 10:41:47.559661+00: Creating new table 'solutions_20230112' for values from 2023-01-12 to 2023-01-13...
NOTICE:  2023-03-15 10:41:47.561249+00: Table 'solutions_20230112' has been created
NOTICE:  2023-03-15 10:41:47.561291+00: Migrating data to 'solutions_20230112'...
NOTICE:  2023-03-15 10:44:07.43721+00: Data has been migrated
NOTICE:  2023-03-15 10:44:07.43729+00: Attaching partition 'solutions_20230112'...
NOTICE:  2023-03-15 10:45:41.37163+00 'solutions_20230112' has been attached
NOTICE:  2023-03-15 10:45:41.371922+00: Attaching partition 'solutions_legacy'...
NOTICE:  2023-03-15 10:47:56.991827+00 'solutions_legacy' has been attached
DO

Query returned successfully in 20 min.

我的问题

即使数据成功迁移到新创建的分区,在迁移过程中,最终用户也无法使用表“solutions”,甚至SELECT查询也在等待锁。我想这是因为整个脚本是一个事务。因此,我尝试在调用detach_partition('solutions_legacy')之后或在此过程的最后提交更改,但在大多数情况下,我会收到错误消息
错误:事务终止无效
我试过在一些地方添加COMMIT,也试过使用开始/END语句。
解决这个问题的正确方法是什么?
基于x86_64-pc-linux-musl的PostgreSQL 15.2,由gcc编译(阿尔卑斯山12.2.1_git20220924-r4)12.2.1 20220924,64位

t30tvxxf

t30tvxxf1#

DO块允许事务控制当且仅当您没有将其 Package 在事务中时。
我建议在事务的外部执行do块commit语句应该按计划工作。
编辑:参见https://www.postgresql.org/docs/current/sql-do.html#id-1.9.3.102.8获取文档。

相关问题