如果子查询在postgres中返回多行，则跳过行

t98cgbkg 于 2021-07-24 发布在 Java

关注(0)|答案(2)|浏览(358)

我想更新表 prod_replay_out 基于postgres中的子查询结果。不过，子查询返回多行，但我想跳过这些行，并基于子查询返回的单行更新表。
我已引用链接子查询返回超过1行错误，但 max() 函数将不适用于我的预期结果。你能给我一些修改查询的建议吗？谢谢您。 prod_replay_out 包含以下列：
卖家，买家，发件人标签，卖家标签，买家标签，isin，数量，输入消息时间，消息类型，cdsx时间 prod_replay_in 包含以下列：
卖家、买家、发件人标签、卖家标签、买家标签、isin、数量、消息类型、cdsx时间
我试过什么？
请查找以下更新sql：
更新sql:

update prod_replay_out O  
   set in_msg_id = 
        (Select id
           From prod_replay_in I
          Where I.msg_type   = 'CDST010'
            and I.seller     = O.seller
            and I.buyer      = O.buyer
            and I.sender_tag = O.sender_tag
            and I.seller_tag = O.seller_tag
            and I.buyer_tag  = O.buyer_tag
            and I.isin       = O.isin
            and I.quantity   = O.quantity
            and I.cdsx_time  = O.in_msg_time
            and I.cdsx_time::text like '2020-05-12%'
         ) 
where O.msg_type = 'CDST01C'
and O.cdsx_time::text like '2020-05-12%';

我试过下面的方法。这是正确的方法还是有漏洞？

update prod_replay_out O  
   set in_msg_id = 
        (Select id
           From prod_replay_in I
          Where I.msg_type   = 'CDST010'
            and I.seller     = O.seller
            and I.buyer      = O.buyer
            and I.sender_tag = O.sender_tag
            and I.seller_tag = O.seller_tag
            and I.buyer_tag  = O.buyer_tag
            and I.isin       = O.isin
            and I.quantity   = O.quantity
            and I.cdsx_time  = O.in_msg_time
            and I.cdsx_time::text like '2020-05-12%'
            and 1 = (Select count(id)
                       From prod_replay_in I
                      Where I.msg_type   = 'CDST010'
                        and I.seller     = O.seller
                        and I.buyer      = O.buyer
                        and I.sender_tag = O.sender_tag
                        and I.seller_tag = O.seller_tag
                        and I.buyer_tag  = O.buyer_tag
                        and I.isin       = O.isin
                        and I.quantity   = O.quantity
                        and I.cdsx_time  = O.in_msg_time
                        and I.cdsx_time::text like '2020-05-12%'
                    )    
                )
where O.msg_type = 'CDST01C'
  and O.cdsx_time::text like '2020-05-12%';

sql postgresql subquery select sql-update

来源：https://stackoverflow.com/questions/62518120/skip-rows-if-subquery-returns-multiple-rows-in-postgres

2条答案

按热度按时间

31moq8wy1#

您只想在子查询返回一行时更新。其中一个选项使用了aggregation和 having 在子查询中：

update prod_replay_out o  
set in_msg_id = (
    select max(id)
    from prod_replay_in i
    where 
        i.msg_type       = 'cdst010'
        and i.seller     = o.seller
        and i.buyer      = o.buyer
        and i.sender_tag = o.sender_tag
        and i.seller_tag = o.seller_tag
        and i.buyer_tag  = o.buyer_tag
        and i.isin       = o.isin
        and i.quantity   = o.quantity
        and i.cdsx_time  = o.in_msg_time
        and i.cdsx_time  >= '2020-05-12'::date
        and i.cdsx_time  <  '2020-05-13'::date
    having count(*) = 1
) 
where 
    o.msg_type = 'cdst01c'
    and o.cdsx_time  >= '2020-05-12'::date
    and o.cdsx_time  <  '2020-05-13'::date

请注意，我重写了日期过滤器，以避免转换为文本（您可以使用日期文本的半开放间隔，这要有效得多）。
请注意，此更新 in_msg_id 至 null 当子查询将返回多行（或者根本没有行）时。如果要避免这种情况，可以在 where 条款：

update prod_replay_out o  
set in_msg_id = (
    select max(id)
    from prod_replay_in i
    where 
        i.msg_type       = 'cdst010'
        and i.seller     = o.seller
        and i.buyer      = o.buyer
        and i.sender_tag = o.sender_tag
        and i.seller_tag = o.seller_tag
        and i.buyer_tag  = o.buyer_tag
        and i.isin       = o.isin
        and i.quantity   = o.quantity
        and i.cdsx_time  = o.in_msg_time
        and i.cdsx_time  >= '2020-05-12'::date
        and i.cdsx_time  <  '2020-05-13'::date
    having count(*) = 1
) 
where 
    o.msg_type = 'cdst01c'
    and o.cdsx_time  >= '2020-05-12'::date
    and o.cdsx_time  <  '2020-05-13'::date
    and (
        select count(*)
        from prod_replay_in i
        where 
            i.msg_type       = 'cdst010'
            and i.seller     = o.seller
            and i.buyer      = o.buyer
            and i.sender_tag = o.sender_tag
            and i.seller_tag = o.seller_tag
            and i.buyer_tag  = o.buyer_tag
            and i.isin       = o.isin
            and i.quantity   = o.quantity
            and i.cdsx_time  = o.in_msg_time
            and i.cdsx_time  >= '2020-05-12'::date
            and i.cdsx_time  <  '2020-05-13'::date
    ) = 1

赞(0）回复(0）举报 2021-07-24

kyvafyod2#

查询

最重要的是，不要使用相关子查询。这是做这项工作的下等工具。在中使用子查询 FROM 条款。
这只更新在源表中找到单个匹配候选行的位置（既不是无行也不是多行），也只更新实际更改值的位置：

UPDATE prod_replay_out o
SET    in_msg_id = i.id
FROM  (
   SELECT i.id, i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time
   FROM   prod_replay_in i
   WHERE  i.msg_type   = 'CDST010'
   AND    i.cdsx_time >= '2020-05-12'     -- ① "sargable" expression
   AND    i.cdsx_time <  '2020-05-13'     -- ② don't cast to date, it's a valid timestamp literal
   AND    NOT EXISTS (                    -- ③ EXISTS is typically faster than counting
      SELECT FROM prod_replay_in x
      WHERE  x.id <> i.id                 -- ④ unique
      AND   (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time)  -- ⑤ short syntax
        =   (x.seller, x.buyer, x.sender_tag, x.seller_tag, x.buyer_tag, x.isin, x.quantity, x.cdsx_time)
      )
   ) i
WHERE  o.msg_type = 'CDST01C'
AND   (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time)
  =   (o.seller, o.buyer, o.sender_tag, o.seller_tag, o.buyer_tag, o.isin, o.quantity, o.in_msg_time)  -- ⑥ o.cdsx_time?
-- AND    o.cdsx_time >= '2020-05-12'     -- ⑦ redundant
-- AND    o.cdsx_time <  '2020-05-13'
AND   o.in_msg_id IS DISTINCT FROM i.id   -- ⑧ avoid empty updates
;

① 像gmb已经建议的那样，将这个 predicate 转换为“sargable”表达式。这通常更快，并且可以使用索引支持。
“sargable”这个词到底是什么意思？
计算sql中的并发事件数
② 但不要投 date 如果 cdsx_time 是一个 timestamp 列（似乎有可能）。 '2020-05-12' 是一个完全有效的时间戳文本，表示一天的第一个示例。请参见：
在postgresql中生成两个日期之间的时间序列
如果它是一个 timestamptz 列，考虑 timezone 设置！请参见：
在rails和postgresql中完全忽略时区
③ EXISTS 通常（远）比计算所有行的效率更高，因为它可以在找到另一行时立即停止。尤其是如果可以有许多对等点，并且索引支持是可用的。请参见：
选择其他表中不存在的行
④ 假设 id 是唯一的（或pk）。否则使用系统列 ctid 为了这份工作。请参见：
如何（或如何）在多个列上选择distinct？
⑤ 方便、等价的行值简短语法。请参见：
强制索引扫描进行多列比较
⑥ 您的查询有：

and I.cdsx_time  = O.in_msg_time         -- !?
and I.cdsx_time::text like '2020-05-12%'

... 但是：

O.cdsx_time::text like '2020-05-12%'

你不是有意要写的 and I.cdsx_time = O.cdsx_time ?
⑦ 可能是噪音。该限制已在子查询中实施(也无助于索引支持。）
⑧ 如果某些列可能已经具有所需的值，则此项非常重要。然后跳过该操作，而不是以全部成本写入相同的行版本。
如果两列都已定义 NOT NULL ，简化为 o.in_msg_id <> i.id . 请再次参阅：
在postgresql中用另一个表的列更新一个表的列

指数

如果性能是一个问题或您重复运行此操作，请考虑以下索引：
对于第一个（按预期查询计划的顺序！）确定源行候选的步骤：

CREATE INDEX foo ON prod_replay_in (msg_type, cdsx_time);

排除重复的第二步：

CREATE INDEX foo ON prod_replay_in (seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, cdsx_time);

或者任何足够有选择性的小子集。如果在索引扫描中包含相对较少的附加行作为“误报”，则在较少的列上使用较小的索引通常效率更高。虽然数量相对较少，但在以下情况下，它们被廉价地消除了 FILTER 步骤。
对于标识目标行的最后一步：

CREATE INDEX foo ON prod_replay_out (msg_type, in_msg_time);

再说一次：或者任何一个足够有选择性的小子集。

赞(0）回复(0）举报 2021-07-24

我来回答

如果子查询在postgres中返回多行，则跳过行

2条答案

查询

指数

相关问题

热门标签

最新问答