postgresql 如何根据另一列的值进行条件聚合?

idfiyjo8  于 2022-11-29  发布在  PostgreSQL
关注(0)|答案(1)|浏览(172)

我有一个名为device_data的表,如下所示

Column     |            Type             | Collation | Nullable |                     Default                     
----------------+-----------------------------+-----------+----------+-------------------------------------------------
 id             | integer                     |           |          | nextval('device_data_id_seq'::regclass)
 date           | timestamp without time zone |           |          | 
 packet_drop    | real                        |           |          | 
 jitter         | real                        |           |          | 
 latency        | real                        |           |          | 
 alert          | character varying(50)       |           |          |

它基本上以分钟为单位存储packet dropsjitterlatency
现在有另一个名为alert的列,它根据阈值保存值HIGHMEDIUMLOW,如果不满足阈值,则保存空字符串``。
现在,我在表上执行每小时平均值,查询如下所示

select date_trunc('hour', date) as hourly, avg(jitter), avg(latency), avg(packet_drop) from device_data where date between '2022-11-26 17:41:11' and '2022-11-27 17:41:11' group by hourly order by hourly;

输出

hourly        |        avg         |        avg         |         avg         
---------------------+--------------------+--------------------+---------------------
 2022-11-26 17:00:00 |   3.52857138642243 |  2.771428568022592 |                   0
 2022-11-26 18:00:00 |  2.484615419346553 |  2.815384602546692 |                   0
 2022-11-26 19:00:00 |  2.218461540570626 |  2.723076921242934 |                   0
 2022-11-26 20:00:00 |  5.098461512992015 | 2.7076923021903405 |                   0
 2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 |                   0
 2022-11-26 22:00:00 |  5.672307815345434 |  2.810769222332881 |                   0
 2022-11-26 23:00:00 | 2.7828124976949766 |  2.893749985843897 |                   0
 2022-11-27 00:00:00 | 2.6046153992414474 | 2.8030769238105187 |                   0
 2022-11-27 01:00:00 |  3.846031717837803 | 2.8333333200878568 |                   0
 ...
(25 rows)

接下来,我对具有附加列alert的表进行每小时平均,因此我的查询如下所示

select date_trunc('hour', date) as hourly, avg(jitter), avg(latency), avg(packet_drop), alert from device_data where date between '2022-11-26 17:41:11' and '2022-11-27 17:41:11' group by hourly, alert order by hourly;

输出

hourly        |        avg         |        avg         |         avg         | alert 
---------------------+--------------------+--------------------+---------------------+----------------
 2022-11-26 17:00:00 | 1.2649999938905239 |  2.755000001192093 |                   0 | 
 2022-11-26 17:00:00 |  48.79999923706055 | 3.0999999046325684 |                   0 | MEDIUM
 2022-11-26 18:00:00 |  2.484615419346553 |  2.815384602546692 |                   0 | 
 2022-11-26 19:00:00 |  2.218461540570626 |  2.723076921242934 |                   0 | 
 2022-11-26 20:00:00 | 2.6603174461495307 |     2.695238092589 |                   0 | 
 2022-11-26 20:00:00 |  106.5999984741211 | 3.0999999046325684 |                   0 | HIGH
 2022-11-26 20:00:00 |  57.20000076293945 | 3.0999999046325684 |                   0 | MEDIUM
 2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 |                   0 | 
 2022-11-26 22:00:00 | 2.8349206649831364 |  2.793650784189739 |                   0 | 
 2022-11-26 22:00:00 |  95.05000305175781 |  3.350000023841858 |                   0 | HIGH
 2022-11-26 23:00:00 | 2.7828124976949766 |  2.893749985843897 |                   0 | 
 2022-11-27 00:00:00 |  2.132812526775524 |           2.796875 |                   0 | 
 2022-11-27 00:00:00 |  32.79999923706055 |  3.200000047683716 |                   0 | LOW
 2022-11-27 01:00:00 |  1.849999995962266 |  2.822580631702177 |                   0 | 
 2022-11-27 01:00:00 |  127.5999984741211 |                3.5 |                   0 | HIGH
 ...
(35 rows)

正如您在我的第二个查询中所看到的,一些行具有重复的时间戳,如2022-11-26 17:00:002022-11-26 20:00:00等。
据我所知,当平均列时,在alert列的情况下,它会遇到多个值,可能是空字符串沿着一些HIGHMEDIUMLOW字符串,也可能是导致同一时间戳的多行的其他组合。

要求

我想要的是简单的平均值(最初忽略alert列值),然后检查alert列中该特定小时的HIGH字符串(我做的是每小时平均)。如果字符串存在,只需在alert列中为该行赋值HIGH。如果没有找到HIGH字符串,只需在alert列中为该行分配空字符串``。

预期输出

hourly        |        avg         |        avg         |         avg         | alert 
---------------------+--------------------+--------------------+---------------------+----------------
 2022-11-26 17:00:00 |  3.52857138642243   |  2.771428568022592 |                   0 | 
 2022-11-26 18:00:00 |  2.484615419346553  |  2.815384602546692 |                   0 | 
 2022-11-26 19:00:00 |  2.218461540570626  |  2.723076921242934 |                   0 | 
 2022-11-26 20:00:00 |  5.098461512992015  | 2.7076923021903405 |                   0 |  HIGH
 2022-11-26 21:00:00 |  2.0060606116824076 | 2.6469696814363655 |                   0 | 
 2022-11-26 22:00:00 |  5.672307815345434  |  2.810769222332881 |                   0 |  HIGH
 2022-11-26 23:00:00 |  2.7828124976949766 |  2.893749985843897 |                   0 | 
 2022-11-27 00:00:00 |  2.6046153992414474 | 2.8030769238105187 |                   0 | 
 2022-11-27 01:00:00 |  3.846031717837803  | 2.8333333200878568 |                   0 |  HIGH
 ...
(25 rows)

我想通过执行类似于where alert = 'HIGH'的操作,只将alert列中包含HIGH值的行作为目标,但这样会删除其他行,并影响平均值计算。
我该怎么做?

ki0zmccv

ki0zmccv1#

您还需要为列“alert”使用聚合函数。由于您只想显示某个值,因此可以在聚合中使用条件表达式,如下所示:

select 
     date_trunc('hour', date) as hourly
   , avg(jitter)
   , avg(latency)
   , avg(packet_drop)
   , coalesce(MAX(case when alert='HIGH' then 'HIGH' end),'') as alert 
from device_data 
where date between '2022-11-26 17:41:11' and '2022-11-27 17:41:11' 
group by hourly order by hourly;

这应该会起作用,因为MAX函数会忽略NULL值(除HIGH之外的所有警报值都将导致case表达式中的结果为NULL)。

相关问题