PostgreSQL中的SQL查询帮助

szqfcxe2  于 2023-10-18  发布在  PostgreSQL
关注(0)|答案(2)|浏览(81)

概述

我正在尝试开发一个查询,它可以根据需要选择尽可能多的行,以便更新最多LIMIT unique thing.thing_id s,并按status_1_date列排序(这意味着旧的内容应该首先更新)。thing_id不是主键。

示例架构

CREATE TABLE IF NOT EXISTS thing (
  name VARCHAR(255) PRIMARY KEY,
  thing_id VARCHAR(255),
  c_id VARCHAR(255),
  status VARCHAR(255),
  etag VARCHAR(255),
  last_modified VARCHAR(255),
  size VARCHAR(255),
  status_1_date DATE
);

样本数据

INSERT INTO thing (name,thing_id, c_id,status, status_1_date)
values 
('protocol://full/path/to/thing/thing_1.file1', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:01'),
('protocol://full/path/to/thing/thing_1.file2', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:02'),
('protocol://full/path/to/thing/thing_2.file5', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:02.5'),
  ('protocol://something else', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:02.8'),
('protocol://full/path/to/thing/thing_2.file1', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:03'),
('protocol://full/path/to/thing/thing_2.file2', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:04'),
('protocol://full/path/to/thing/thing_2.file3', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:05'),
('protocol://full/path/to/thing/thing_2.file4', 'thing_2','different', 'status_1', '2023-09-29 09:00:05'),
('protocol://full/path/to/thing/thing_3.file1', 'thing_3','c_id', 'status_1', '2023-09-29 09:00:06'),
('protocol://full/path/to/thing/thing_3.file2', 'thing_3','c_id', 'status_1', '2023-09-29 09:00:06.2'),
('protocol://full/path/to/thing/thing_4.file1', 'thing_4','c_id', 'status_1', '2023-09-29 09:00:06.1'),
('protocol://full/path/to/thing/thing_4.file2', 'thing_4','c_id', 'status_1', '2023-09-29 09:00:06.3'),
('protocol://full/path/to/thing/thing_5.file1', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.4'),
('protocol://full/path/to/thing/thing_5.file2', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.5'),
('protocol://full/path/to/thing/thing_5.file3', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.6'),
('protocol://full/path/to/thing/thing_6.file1', 'thing_6','c_id', 'status_1', '2023-09-29 09:00:06.7');

预期结果

使用这些数据,我试图得到一个查询,当使用LIMIT 3时,它会更新第1-6行和第7-9行。应该更新与there WHERE子句匹配的thing_id s thing_5thing_6thing_1的行。

其他详细信息

我试图解决的问题是,我使用一个大型数据库,我需要限制从它返回的数据,但确保我得到一个完整的数据集的实体,我查询。thing.thing_id不是我们数据库中的主键,但是thing_id表示一个实体,它可以由来自N行的数据组成,如共享thing_id所示。我需要确保我获得了所有事物实体的完整行集,其中LIMIT表示我正在获取数据的事物实体的总数。
这个问题是从以前的线程产生的,但没有解决方案:Unable to get Postgresql LIMIT to work with UPDATE and subquery

Answer的问题

  • 它不会获取具有最旧文件的thing_idhttps://dbfiddle.uk/GStIbrYA
  • 如果两个针对相同thing.c_idthing.name前缀的查询同时运行,则会发生双重更新。我正在处理的数据库中的这个更新可能一次处理100,000行,所以这是不可取的。
    替代尝试1

看起来可能需要FOR UPDATE来防止双重更新发生,但它不能与窗口函数一起使用。

with ranked_matching_things as (
  select
    dense_rank() over (order by thing_id) as thing_rank,
    name
  from thing
  where c_id = 'c_id'
    and name like 'protocol://full/path/to/thing/%'
  order by status_1_date
  for update
)
update thing
set status = 'CHANGED'
from ranked_matching_things
where thing.name = ranked_matching_things.name
  and ranked_matching_things.thing_rank <= 3

ERROR:  FOR UPDATE is not allowed with window functions

替代尝试2

UPDATE thing
SET status = 'CHANGED'
FROM (
    SELECT name 
    FROM thing
    WHERE thing.thing_id in (
        SELECT DISTINCT thing.thing_id
        FROM thing
        WHERE c_id = 'c_id' AND name like 'protocol://full/path/to/thing/%' AND status = 'status_1'
        ORDER BY thing.thing_id
        LIMIT (3)
    ) AND thing.c_id = 'c_id' AND thing.name like 'protocol://full/path/to/thing/%' AND thing.status = 'status_1'
    ORDER BY thing.status_1_date
    FOR UPDATE
) AS rows
WHERE thing.name = rows.name 
RETURNING *

测试设置

以下链接可能对测试有用:https://dbfiddle.uk/ZTcbkqsF

jgzswidk

jgzswidk1#

这不能有限制,因为这太...有限。相反,我们可以使用window function。让我们把它分解…
首先,我们识别匹配的行并将它们按顺序排列。

select
  *
from thing
where c_id = 'c_id'
  and name like 'protocol://full/path/to/thing/%'
order by status_1_date;

很简单。
从这些结果中,我们需要选择前3个不同的匹配thing_id。我们可以用dense_rank向结果中添加一列。

select
  dense_rank() over (order by thing_id) as thing_rank,
  *
from thing
where c_id = 'c_id'
  and name like 'protocol://full/path/to/thing/%'
order by status_1_date;
thing_rank  name    thing_id    c_id    status  status_1_date
1   protocol://full/path/to/thing/thing_1.file1     thing_1     c_id    status_1    2023-09-29 09:00:01
1   protocol://full/path/to/thing/thing_1.file2     thing_1     c_id    status_1    2023-09-29 09:00:02
2   protocol://full/path/to/thing/thing_2.file5     thing_2     c_id    status_1    2023-09-29 09:00:02.5
2   protocol://full/path/to/thing/thing_2.file1     thing_2     c_id    status_1    2023-09-29 09:00:03
2   protocol://full/path/to/thing/thing_2.file2     thing_2     c_id    status_1    2023-09-29 09:00:04
2   protocol://full/path/to/thing/thing_2.file3     thing_2     c_id    status_1    2023-09-29 09:00:05
3   protocol://full/path/to/thing/thing_3.file1     thing_3     c_id    status_1    2023-09-29 09:00:06
4   protocol://full/path/to/thing/thing_4.file1     thing_4     c_id    status_1    2023-09-29 09:00:06.1
3   protocol://full/path/to/thing/thing_3.file2     thing_3     c_id    status_1    2023-09-29 09:00:06.2
4   protocol://full/path/to/thing/thing_4.file2     thing_4     c_id    status_1    2023-09-29 09:00:06.3
5   protocol://full/path/to/thing/thing_5.file1     thing_5     c_id    status_1    2023-09-29 09:00:06.4
5   protocol://full/path/to/thing/thing_5.file2     thing_5     c_id    status_1    2023-09-29 09:00:06.5
5   protocol://full/path/to/thing/thing_5.file3     thing_5     c_id    status_1    2023-09-29 09:00:06.6
6   protocol://full/path/to/thing/thing_6.file1     thing_6     c_id    status_1    2023-09-29 09:00:06.7

现在我们有一种方法来识别与前三个thing_id对应的行:thing_1具有等级1,thing_2具有等级2,并且thing_3具有等级3。
我们使用dense_rank,而不是rank,以确保秩是1,2,3.否则我们会有差距。
在主键上与该子查询联接,并且仅更新rank <= 3的子查询。

with ranked_matching_things as (
  select
    dense_rank() over (order by thing_id) as thing_rank,
    name
  from thing
  where c_id = 'c_id'
    and name like 'protocol://full/path/to/thing/%'
  order by status_1_date
)
update thing
set status = 'CHANGED'
from ranked_matching_things
where thing.name = ranked_matching_things.name
  and ranked_matching_things.thing_rank <= 3

Demonstration

bvjxkvbb

bvjxkvbb2#

我能够得出以下可能是次优的解决方案。

查询获取与最旧文件关联的有效thing_id列表:

SELECT thing_id, MIN(status_1_date)
FROM thing
WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
GROUP BY thing_id

查询获取LIMITthing_id s,首先获取文件最旧的thing_id s:

SELECT thing_id
FROM (
    SELECT thing_id, MIN(status_1_date) AS oldest_file
    FROM thing
    GROUP BY thing_id
) as thing_ids
ORDER BY thing_ids.oldest_file
LIMIT 3

查询获取所需行FOR UPDATE

SELECT * 
FROM thing
WHERE thing.thing_id in (
    SELECT thing_id
        FROM (
            SELECT thing_id, MIN(status_1_date) AS oldest_file
            FROM thing
            GROUP BY thing_id
        ) as thing_ids
    ORDER BY thing_ids.oldest_file
)
FOR UPDATE

查询UPDATE有效行:

WITH rows AS (
    SELECT * 
    FROM thing
    WHERE thing.thing_id in (
        SELECT thing_id
            FROM (
                SELECT thing_id, MIN(status_1_date) AS oldest_file
                FROM thing
                WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
                GROUP BY thing_id
            ) as thing_ids
        WHERE thing.status = ('status_1') AND thing.c_id = ('c_id') AND thing.name LIKE ('protocol://full/path/to/thing/%')
        ORDER BY thing_ids.oldest_file
        LIMIT 3
    )
    FOR UPDATE
)
UPDATE thing
SET status = 'CHANGED'
FROM rows
WHERE thing.name = rows.name
RETURNING *

解决方案:https://dbfiddle.uk/Z7trYUKa
版本1:

WITH thing_ids as (
    SELECT thing_id, MIN(status_1_date) AS oldest
    FROM thing
    WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
    GROUP BY thing_id
    ORDER BY oldest
    LIMIT 3
),
names AS (
    SELECT name
    FROM thing
    WHERE thing.thing_id IN (SELECT thing_id FROM thing_ids) AND thing.status = ('status_1') AND thing.c_id = ('c_id') AND thing.name LIKE ('protocol://full/path/to/thing/%')
    FOR UPDATE
)
UPDATE thing
SET status = 'CHANGED'
FROM names
WHERE thing.name = names.name
RETURNING *

版本1解决方案:https://dbfiddle.uk/dg8poZaA

我确信一定有某种方法可以防止多余的WHERE子句,但到目前为止我还无法做到这一点。

修订版2,无重复WHERE

我可以通过创建一个UNLOGGED表来解决冗余的WHERE子句。
创建所有匹配WHERE子句的潜在行的表:

CREATE UNLOGGED TABLE rows AS (
  SELECT name, thing_id, status_1_date
  FROM thing
  WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
)

更新查询:

UPDATE thing
SET status = 'CHANGED'
WHERE name IN (
  SELECT name
  FROM thing 
  WHERE name IN (SELECT name FROM rows) AND
  thing_id IN (
    SELECT thing_id
    FROM rows
    GROUP BY thing_id
    ORDER BY MIN(status_1_date)
    LIMIT 3
  )
  FOR UPDATE
)
RETURNING *

相关问题