postgresql SQL:两个没有重复的聚合函数

uxh89sit  于 2023-06-22  发布在  PostgreSQL
关注(0)|答案(2)|浏览(222)

如何使用两个聚合函数(例如double string_agg或仅sum),但确保结果不包含由另一个聚合函数(由第二次连接引起)引起的重复?我使用PostgreSQL。

示例

我有三张table:

create table boxes
(
    id   bigserial primary key,
    name varchar(255)
);

create table animals
(
    id     bigserial primary key,
    name   varchar(255),
    age    numeric,
    box_id bigint constraint animals_boxes_id references boxes
);

create table vegetables
(
    id     bigserial primary key,
    name   varchar(255),
    weight numeric,
    box_id bigint constraint vegatables_box_id references boxes
);

一些输入数据:

insert into boxes (name) values ('First box');
insert into animals (box_id, name, age) values (1, 'Cat', 2);
insert into animals (box_id, name, age) values (1, 'Cat', 3);
insert into animals (box_id, name, age) values (1, 'Dog', 5);
insert into vegetables (box_id, name, weight) values (1, 'Tomato', 20);
insert into vegetables (box_id, name, weight) values (1, 'Cucumber', 30);
insert into vegetables (box_id, name, weight) values (1, 'Potato', 50);

我想把动物的名字放在盒子里:

select b.name                                 as box_name,
       string_agg(a.name, ', ' order by a.id) as animal_names
from boxes as b
         left join animals a on b.id = a.box_id
group by b.name;

它的工作原理:
| 盒名|动物名称|
| - -----|- -----|
| 第一个盒子|猫,猫,狗|
但我也想得到蔬菜的名字。但它不工作

select b.name                                 as box_name,
       string_agg(a.name, ', ' order by a.id) as animal_names,
       string_agg(v.name, ', ' order by v.id) as vegatable_names
from boxes as b
         left join animals a on b.id = a.box_id
         left join vegetables v on b.id = v.box_id
group by b.name;

它在动物名称和植物名称中产生重复:
| 盒名|动物名称|蔬菜名称|
| - -----|- -----|- -----|
| 第一个盒子|猫猫猫猫狗狗|番茄,番茄,番茄, cucumber , cucumber , cucumber ,土豆,土豆,土豆|
结果应为:
| 盒名|动物名称|蔬菜名称|
| - -----|- -----|- -----|
| 第一个盒子|猫,猫,狗|番茄, cucumber ,土豆|
我不能添加distinct来删除重复项,因为:

  • 表中的名称可以重复(两个名称为Cat的动物)。如果我使用distinct,它会产生Cat, Dog而不是Cat, Cat, Dog
  • 我在string_agg中使用order by(添加distinct时会导致ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list)。即使我删除了order bystring_agg(distinct a.name, ', ')),我也不能使用它,因为第一个点。

附加信息

它适用于所有聚合函数:string_aggarray_aggjson_object_agg和甚至sum
动物总年龄:

select sum(a.age)
from boxes as b
         left join animals a on b.id = a.box_id
         -- left join vegetables v on b.id = v.box_id
group by b.name;

如果没有第二个连接,它计算正确(10),但由于重复,它计算错误(30)。

icnyk63a

icnyk63a1#

基本问题解释如下:

  • 两个SQL LEFT JOIN生成错误结果

对于较小的选择,每行聚合通常更快。
使用**LATERAL子查询**(更通用):

SELECT b.name AS box_name, a.*, v.*
FROM   boxes b
LEFT  JOIN LATERAL (
   SELECT string_agg(a.name, ', ' ORDER BY a.id) AS animal_names
   FROM   animals a
   WHERE  a.box_id = b.id
   ) a ON true
LEFT  JOIN LATERAL (
   SELECT string_agg(v.name, ', ' ORDER BY v.id) AS vegetable_names
   FROM   vegetables v
   WHERE  v.box_id = b.id
   ) v ON true;

或者使用correlated subqueries(更简单,通常更快):

SELECT b.name AS box_name
    , (SELECT string_agg(a.name, ', ' ORDER BY a.id)
       FROM   animals a
       WHERE  a.box_id = b.id)  AS animal_names
    , (SELECT string_agg(v.name, ', ' ORDER BY v.id)
       FROM   vegetables v
       WHERE  v.box_id = b.id) AS vegetable_names
FROM   boxes b;

参见:

  • PostgreSQL中的LATERAL JOIN和subquery有什么区别?
  • 单个查询中的多个array_agg()调用

在聚合整个表时,这样更快:

SELECT b.name AS box_name, a.animal_names, v.vegetable_names
FROM   boxes b
LEFT   JOIN (
   SELECT box_id, string_agg(a.name, ', ') AS animal_names   
   FROM  (
      SELECT box_id, id, name
      FROM   animals a
      ORDER  BY box_id, id
      ) a
   GROUP  BY 1
   ) a ON a.box_id = b.id
LEFT   JOIN (
   SELECT box_id, string_agg(v.name, ', ') AS vegetable_names 
   FROM  (
      SELECT box_id, id, name
      FROM   vegetables v
      ORDER  BY box_id, id
      ) v
   GROUP  BY 1   
   ) v ON v.box_id = b.id;

fiddle
请注意我是如何在子查询中进行排序的,这通常比按聚合进行排序要快。可选的优化。
旁白:关于测试设置中的varchar(255)

w51jfk4q

w51jfk4q2#

尝试将vegetables表连接到animals

SELECT b.name AS box_name
    ,string_agg(a.name, ', ' ORDER BY a.id) AS animal_names
    ,string_agg(v.name, ', ' ORDER BY v.id) AS vegetable_names
FROM boxes AS b
LEFT JOIN animals a ON b.id = a.box_id
LEFT JOIN vegetables v ON v.id = a.id
GROUP BY b.name;

输出:
| 盒名|动物名称|菜名|
| - -----|- -----|- -----|
| 第一个盒子|猫,猫,狗|番茄, cucumber ,土豆|
fiddle这里

相关问题