postgresql 具有多个值列的数据透视表

qij5mzcb  于 2023-11-18  发布在  PostgreSQL
关注(0)|答案(4)|浏览(153)

我有一个Postgres表,其中包含来自不同制造商的产品数据,这里是简化的表结构:

CREATE TABLE test_table (
  sku               text,
  manufacturer_name text,
  price             double precision,
  stock             int
);

INSERT INTO test_table
VALUES ('sku1', 'Manufacturer1', 110.00, 22),
       ('sku1', 'Manufacturer2', 120.00, 15),
       ('sku1', 'Manufacturer3', 130.00, 1),
       ('sku1', 'Manufacturer3', 30.00, 11),
       ('sku2', 'Manufacturer1', 10.00, 2),
       ('sku2', 'Manufacturer2', 9.00,  3),
       ('sku3', 'Manufacturer2', 21.00, 3),
       ('sku3', 'Manufacturer2', 1.00, 7),
       ('sku3', 'Manufacturer3', 19.00, 5);

字符串
我需要为每个SKU输出每个制造商,但如果有几个相同的制造商为同一个SKU,我需要选择制造商与最低的价格(请注意,我还需要包括'库存'列),这里所需的结果:

| sku  | man1_price | man1_stock | man2_price | man2_stock | man3_price | man3_stock |
|------|------------|------------|------------|------------|------------|------------|
| sku1 | 110.0      | 22         | 120.0      | 15         | 30.0       | 11         |
| sku2 | 10.0       | 2          | 9.0        | 3          |            |            |
| sku3 |            |            | 1.0        | 7          | 19.0       | 5          |


我尝试使用Postgres crosstab()

SELECT *
FROM crosstab('SELECT sku, manufacturer_name, price
              FROM test_table
              ORDER BY 1,2',
              $$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
       )
       AS ct (sku text, "man1_price" double precision,
              "man2_price" double precision,
              "man3_price" double precision
    );


但是这会产生一个只有一个price列的表,而且我没有找到一种方法来包含stock列。
我还尝试使用条件聚合:

SELECT sku,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN price END) as man1_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN stock END) as man1_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN price END) as man2_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN stock END) as man2_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN price END) as man3_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN stock END) as man3_stock
FROM test_table
GROUP BY sku
ORDER BY sku


这个查询在我的情况下也不起作用-它只是选择最低库存水平-但是如果对于相同的SKU有几个相同的制造商,但价格/库存不同-这个查询选择一个制造商的最低价格和另一个制造商的最低库存。
如何从该表中输出每个制造商的price和相应的stock
P.S.谢谢你们所有人的回答.我的Postgres表是相当小的-没有超过15 k的产品,(我不知道如果这样的数字可以用于适当的比较),但由于欧文Brandstetter要求比较不同的查询性能我运行了3个查询与EXPLAIN ANALYZE,这里是他们的执行时间:

Erwin Brandstetter query:        400 - 450 ms 
Kjetil S query:                  250 - 300 ms
Gordon Linoff query:             200 - 250 ms
a_horse_with_no_name query:      250 - 300 ms


再说一次--我不确定这些数字是否能作为参考。就我的情况而言,我选择了Kjetil SGordon Linoff查询的组合版本,但Erwin Brandstettera_horse_with_no_name变体也非常有用和有趣。值得注意的是,如果我的表在未来最终会有更多而不是更少的制造商-每次调整查询和键入它们的名称将是令人厌烦的-因此来自a_horse_with_no_name答案的查询将是最方便使用的。

yc0p9oo0

yc0p9oo01#

你最后一个选择 almost 可以。但是你应该添加一个where条件,其中每个制造商每个SKU的非最低价格的行被删除。这会产生你预期的结果:

select
  sku,
  min( case when manufacturer_name='Manufacturer1' then price end ) man1_price,
  min( case when manufacturer_name='Manufacturer1' then stock end ) man1_stock,
  min( case when manufacturer_name='Manufacturer2' then price end ) man2_price,
  min( case when manufacturer_name='Manufacturer2' then stock end ) man2_stock,
  min( case when manufacturer_name='Manufacturer3' then price end ) man3_price,
  min( case when manufacturer_name='Manufacturer3' then stock end ) man3_stock
from test_table t
where not exists (
    select 1 from test_table
    where sku=t.sku
    and manufacturer_name=t.manufacturer_name
    and price<t.price
)
group by sku
order by 1;

字符串

q9yhzks0

q9yhzks02#

我发现现在使用JSON结果比使用复杂的pivot要容易得多。生成单个聚合的JSON值并没有打破SQL的固有限制,即在执行查询之前必须知道列的数量(并且必须对所有行都相同)。
你可以使用这样的东西:

select sku, 
       jsonb_object_agg(manufacturer_name, 
                          jsonb_build_object('price', price, 'stock', stock, 'isMinPrice', price = min_price)) as price_info
from (
  select sku, 
         manufacturer_name,
         price, 
         min(price) over (partition by sku) as min_price,
         stock
  from test_table
) t
group by sku;

字符串
上面使用示例数据返回以下结果:

sku  | price_info                                                                                                                                                                                             
-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sku1 | {"Manufacturer1": {"price": 110, "stock": 22, "isMinPrice": false}, "Manufacturer2": {"price": 120, "stock": 15, "isMinPrice": false}, "Manufacturer3": {"price": 30, "stock": 11, "isMinPrice": true}}
sku2 | {"Manufacturer1": {"price": 10, "stock": 2, "isMinPrice": false}, "Manufacturer2": {"price": 9, "stock": 3, "isMinPrice": true}}                                                                       
sku3 | {"Manufacturer2": {"price": 1, "stock": 7, "isMinPrice": true}, "Manufacturer3": {"price": 19, "stock": 5, "isMinPrice": false}}

7fhtutme

7fhtutme3#

我会使用distinct on来限制数据到一个制造商到一个价格。我喜欢Postgres中的filter功能。所以:

select sku,
       max(price) filter (where manufacturer_name = 'Manufacturer1') as man1_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer1') as man1_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer2') as man2_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer2') as man2_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer3') as man3_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer3') as man3_stock
from (select distinct on (manufacturer_name, sku) t.*
      from test_table t
      order by manufacturer_name, sku, price
     ) t
group by sku
order by sku;

字符串

ntjbwcob

ntjbwcob4#

crosstab()必须提供一个 static 列定义列表。您的第二个参数:

$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$

字符串
..
任务的核心问题是crosstab()在其第一个参数中期望查询中的 * 单个 *“值”列。但您希望每行处理 * 两个值列 *(pricestock)。
解决这个问题的一种方法是将多个值打包在一个复合类型中,并在外部SELECT中提取它们。
创建复合类型一次:

CREATE TYPE price_stock AS (price float8, stock int);


临时表或视图也可用于此目的。
然后:

SELECT sku
     , (ps1).price AS price1, (ps1).stock AS stock1
     , (ps2).price AS price2, (ps2).stock AS stock2
     , (ps3).price AS price3, (ps3).stock AS stock3
FROM   crosstab(
   'SELECT DISTINCT ON (sku, manufacturer_name)
           sku, manufacturer_name, (price, stock)::price_stock
    FROM   test_table
    ORDER  BY sku, manufacturer_name, price'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    ) AS ct (sku text
           , ps1 price_stock
           , ps2 price_stock
           , ps3 price_stock
    );


fiddle
注意使用DISTINCT ON的基本查询:

SELECT DISTINCT ON (sku, manufacturer_name)
       sku, manufacturer_name, (price, stock)::price_stock
FROM   test_table
ORDER  BY sku, manufacturer_name, price


这是因为:
如果同一SKU有多个相同的制造商,我需要选择价格最低的制造商
参见:

  • 是否选择每个GROUP BY组中的第一行?

与表类型的替代

为了快速测试,或者如果基础表的行不是太宽,或者如果您不能或不想创建自定义类型,您也可以使用基础表的现有行类型:

-- using table type
SELECT sku
     , (ps1).price AS price1, (ps1).stock AS stock1
     , (ps2).price AS price2, (ps2).stock AS stock2
     , (ps3).price AS price3, (ps3).stock AS stock3
FROM   crosstab(
   'SELECT DISTINCT ON (sku, manufacturer_name)
           sku, manufacturer_name, t
    FROM   test_table t
    ORDER  BY sku, manufacturer_name, price'
  , $$SELECT unnest('{Manufacturer1,Manufacturer2,Manufacturer3}'::text[])$$
    ) AS ct (sku text
           , ps1 test_table
           , ps2 test_table
           , ps3 test_table
    );


fiddle
相关信息:

  • 具有多个值列的透视表/交叉表
  • PostgreSQL交叉表查询

相关问题