PostgreSQL中的SQL查询帮助

szqfcxe2 于 2023-10-18 发布在 PostgreSQL

关注(0)|答案(2)|浏览(83)

概述

我正在尝试开发一个查询，它可以根据需要选择尽可能多的行，以便更新最多LIMIT unique thing.thing_id s，并按status_1_date列排序（这意味着旧的内容应该首先更新）。thing_id不是主键。

示例架构

CREATE TABLE IF NOT EXISTS thing (
  name VARCHAR(255) PRIMARY KEY,
  thing_id VARCHAR(255),
  c_id VARCHAR(255),
  status VARCHAR(255),
  etag VARCHAR(255),
  last_modified VARCHAR(255),
  size VARCHAR(255),
  status_1_date DATE
);

样本数据

INSERT INTO thing (name,thing_id, c_id,status, status_1_date)
values 
('protocol://full/path/to/thing/thing_1.file1', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:01'),
('protocol://full/path/to/thing/thing_1.file2', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:02'),
('protocol://full/path/to/thing/thing_2.file5', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:02.5'),
  ('protocol://something else', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:02.8'),
('protocol://full/path/to/thing/thing_2.file1', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:03'),
('protocol://full/path/to/thing/thing_2.file2', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:04'),
('protocol://full/path/to/thing/thing_2.file3', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:05'),
('protocol://full/path/to/thing/thing_2.file4', 'thing_2','different', 'status_1', '2023-09-29 09:00:05'),
('protocol://full/path/to/thing/thing_3.file1', 'thing_3','c_id', 'status_1', '2023-09-29 09:00:06'),
('protocol://full/path/to/thing/thing_3.file2', 'thing_3','c_id', 'status_1', '2023-09-29 09:00:06.2'),
('protocol://full/path/to/thing/thing_4.file1', 'thing_4','c_id', 'status_1', '2023-09-29 09:00:06.1'),
('protocol://full/path/to/thing/thing_4.file2', 'thing_4','c_id', 'status_1', '2023-09-29 09:00:06.3'),
('protocol://full/path/to/thing/thing_5.file1', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.4'),
('protocol://full/path/to/thing/thing_5.file2', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.5'),
('protocol://full/path/to/thing/thing_5.file3', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.6'),
('protocol://full/path/to/thing/thing_6.file1', 'thing_6','c_id', 'status_1', '2023-09-29 09:00:06.7');

预期结果

使用这些数据，我试图得到一个查询，当使用LIMIT 3时，它会更新第1-6行和第7-9行。应该更新与there WHERE子句匹配的thing_id s thing_5、thing_6和thing_1的行。

其他详细信息

我试图解决的问题是，我使用一个大型数据库，我需要限制从它返回的数据，但确保我得到一个完整的数据集的实体，我查询。thing.thing_id不是我们数据库中的主键，但是thing_id表示一个实体，它可以由来自N行的数据组成，如共享thing_id所示。我需要确保我获得了所有事物实体的完整行集，其中LIMIT表示我正在获取数据的事物实体的总数。
这个问题是从以前的线程产生的，但没有解决方案：Unable to get Postgresql LIMIT to work with UPDATE and subquery

Answer的问题

它不会获取具有最旧文件的thing_id：https://dbfiddle.uk/GStIbrYA
如果两个针对相同thing.c_id和thing.name前缀的查询同时运行，则会发生双重更新。我正在处理的数据库中的这个更新可能一次处理100，000行，所以这是不可取的。
替代尝试1

看起来可能需要FOR UPDATE来防止双重更新发生，但它不能与窗口函数一起使用。

with ranked_matching_things as (
  select
    dense_rank() over (order by thing_id) as thing_rank,
    name
  from thing
  where c_id = 'c_id'
    and name like 'protocol://full/path/to/thing/%'
  order by status_1_date
  for update
)
update thing
set status = 'CHANGED'
from ranked_matching_things
where thing.name = ranked_matching_things.name
  and ranked_matching_things.thing_rank <= 3

ERROR:  FOR UPDATE is not allowed with window functions

替代尝试2

它不会首先获取具有最旧文件的thing_id：https://dbfiddle.uk/tMG-vvU4
重复的where子句似乎是多余的

UPDATE thing
SET status = 'CHANGED'
FROM (
    SELECT name 
    FROM thing
    WHERE thing.thing_id in (
        SELECT DISTINCT thing.thing_id
        FROM thing
        WHERE c_id = 'c_id' AND name like 'protocol://full/path/to/thing/%' AND status = 'status_1'
        ORDER BY thing.thing_id
        LIMIT (3)
    ) AND thing.c_id = 'c_id' AND thing.name like 'protocol://full/path/to/thing/%' AND thing.status = 'status_1'
    ORDER BY thing.status_1_date
    FOR UPDATE
) AS rows
WHERE thing.name = rows.name 
RETURNING *

测试设置

以下链接可能对测试有用：https://dbfiddle.uk/ZTcbkqsF

postgresql

来源：https://stackoverflow.com/questions/77217752/sql-query-help-in-postgresql

2条答案

按热度按时间

jgzswidk1#

这不能有限制，因为这太...有限。相反，我们可以使用window function。让我们把它分解…
首先，我们识别匹配的行并将它们按顺序排列。

select
  *
from thing
where c_id = 'c_id'
  and name like 'protocol://full/path/to/thing/%'
order by status_1_date;

很简单。
从这些结果中，我们需要选择前3个不同的匹配thing_id。我们可以用dense_rank向结果中添加一列。

select
  dense_rank() over (order by thing_id) as thing_rank,
  *
from thing
where c_id = 'c_id'
  and name like 'protocol://full/path/to/thing/%'
order by status_1_date;

thing_rank  name    thing_id    c_id    status  status_1_date
1   protocol://full/path/to/thing/thing_1.file1     thing_1     c_id    status_1    2023-09-29 09:00:01
1   protocol://full/path/to/thing/thing_1.file2     thing_1     c_id    status_1    2023-09-29 09:00:02
2   protocol://full/path/to/thing/thing_2.file5     thing_2     c_id    status_1    2023-09-29 09:00:02.5
2   protocol://full/path/to/thing/thing_2.file1     thing_2     c_id    status_1    2023-09-29 09:00:03
2   protocol://full/path/to/thing/thing_2.file2     thing_2     c_id    status_1    2023-09-29 09:00:04
2   protocol://full/path/to/thing/thing_2.file3     thing_2     c_id    status_1    2023-09-29 09:00:05
3   protocol://full/path/to/thing/thing_3.file1     thing_3     c_id    status_1    2023-09-29 09:00:06
4   protocol://full/path/to/thing/thing_4.file1     thing_4     c_id    status_1    2023-09-29 09:00:06.1
3   protocol://full/path/to/thing/thing_3.file2     thing_3     c_id    status_1    2023-09-29 09:00:06.2
4   protocol://full/path/to/thing/thing_4.file2     thing_4     c_id    status_1    2023-09-29 09:00:06.3
5   protocol://full/path/to/thing/thing_5.file1     thing_5     c_id    status_1    2023-09-29 09:00:06.4
5   protocol://full/path/to/thing/thing_5.file2     thing_5     c_id    status_1    2023-09-29 09:00:06.5
5   protocol://full/path/to/thing/thing_5.file3     thing_5     c_id    status_1    2023-09-29 09:00:06.6
6   protocol://full/path/to/thing/thing_6.file1     thing_6     c_id    status_1    2023-09-29 09:00:06.7

现在我们有一种方法来识别与前三个thing_id对应的行：thing_1具有等级1，thing_2具有等级2，并且thing_3具有等级3。
我们使用dense_rank，而不是rank，以确保秩是1，2，3.否则我们会有差距。
在主键上与该子查询联接，并且仅更新rank <= 3的子查询。

with ranked_matching_things as (
  select
    dense_rank() over (order by thing_id) as thing_rank,
    name
  from thing
  where c_id = 'c_id'
    and name like 'protocol://full/path/to/thing/%'
  order by status_1_date
)
update thing
set status = 'CHANGED'
from ranked_matching_things
where thing.name = ranked_matching_things.name
  and ranked_matching_things.thing_rank <= 3

Demonstration

赞(0）回复(0）举报 2023-10-18

bvjxkvbb2#

我能够得出以下可能是次优的解决方案。

查询获取与最旧文件关联的有效thing_id列表：

SELECT thing_id, MIN(status_1_date)
FROM thing
WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
GROUP BY thing_id

查询获取LIMITthing_id s，首先获取文件最旧的thing_id s：

SELECT thing_id
FROM (
    SELECT thing_id, MIN(status_1_date) AS oldest_file
    FROM thing
    GROUP BY thing_id
) as thing_ids
ORDER BY thing_ids.oldest_file
LIMIT 3

查询获取所需行FOR UPDATE：

SELECT * 
FROM thing
WHERE thing.thing_id in (
    SELECT thing_id
        FROM (
            SELECT thing_id, MIN(status_1_date) AS oldest_file
            FROM thing
            GROUP BY thing_id
        ) as thing_ids
    ORDER BY thing_ids.oldest_file
)
FOR UPDATE

查询UPDATE有效行：

WITH rows AS (
    SELECT * 
    FROM thing
    WHERE thing.thing_id in (
        SELECT thing_id
            FROM (
                SELECT thing_id, MIN(status_1_date) AS oldest_file
                FROM thing
                WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
                GROUP BY thing_id
            ) as thing_ids
        WHERE thing.status = ('status_1') AND thing.c_id = ('c_id') AND thing.name LIKE ('protocol://full/path/to/thing/%')
        ORDER BY thing_ids.oldest_file
        LIMIT 3
    )
    FOR UPDATE
)
UPDATE thing
SET status = 'CHANGED'
FROM rows
WHERE thing.name = rows.name
RETURNING *

解决方案：https://dbfiddle.uk/Z7trYUKa
版本1：

WITH thing_ids as (
    SELECT thing_id, MIN(status_1_date) AS oldest
    FROM thing
    WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
    GROUP BY thing_id
    ORDER BY oldest
    LIMIT 3
),
names AS (
    SELECT name
    FROM thing
    WHERE thing.thing_id IN (SELECT thing_id FROM thing_ids) AND thing.status = ('status_1') AND thing.c_id = ('c_id') AND thing.name LIKE ('protocol://full/path/to/thing/%')
    FOR UPDATE
)
UPDATE thing
SET status = 'CHANGED'
FROM names
WHERE thing.name = names.name
RETURNING *

版本1解决方案：https://dbfiddle.uk/dg8poZaA

我确信一定有某种方法可以防止多余的WHERE子句，但到目前为止我还无法做到这一点。

修订版2，无重复WHERE：

我可以通过创建一个UNLOGGED表来解决冗余的WHERE子句。
创建所有匹配WHERE子句的潜在行的表：

CREATE UNLOGGED TABLE rows AS (
  SELECT name, thing_id, status_1_date
  FROM thing
  WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
)

更新查询：

UPDATE thing
SET status = 'CHANGED'
WHERE name IN (
  SELECT name
  FROM thing 
  WHERE name IN (SELECT name FROM rows) AND
  thing_id IN (
    SELECT thing_id
    FROM rows
    GROUP BY thing_id
    ORDER BY MIN(status_1_date)
    LIMIT 3
  )
  FOR UPDATE
)
RETURNING *

赞(0）回复(0）举报 2023-10-18

我来回答

PostgreSQL中的SQL查询帮助

2条答案

相关问题

热门标签

最新问答