postgresql 使用OVER(PARTITION BY id)对非重复值进行计数

57hvy0tb  于 2022-11-23  发布在  PostgreSQL
关注(0)|答案(5)|浏览(250)

是否可以结合窗口函数(如OVER(PARTITION BY id))对非重复值进行计数?目前我的查询如下所示:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
COUNT(DISTINCT congestion.week_nb) OVER(
    PARTITION BY congestion.id_element
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

但是,当我尝试执行查询时,我得到以下错误:

"COUNT(DISTINCT": "DISTINCT is not implemented for window functions"
trnvg8h3

trnvg8h31#

不,正如错误消息所述,DISTINCT没有使用Windows函数实现。将this link中的信息应用到您的案例中,您可以使用以下内容:

WITH uniques AS (
 SELECT congestion.id_element, COUNT(DISTINCT congestion.week_nb) AS unique_references
 FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
 GROUP BY congestion.id_element
)

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
uniques.unique_references AS week_count
FROM congestion
JOIN uniques USING (id_element)
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

根据具体情况,您也可以将子查询直接放入SELECT-list中:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
(SELECT COUNT(DISTINCT dist_con.week_nb)
    FROM congestion AS dist_con
    WHERE dist_con.date >= '2014.01.01'
    AND dist_con.date <= '2014.12.31'
    AND dist_con.id_element = congestion.id_element) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date
j13ufse2

j13ufse22#

我发现最简单的方法是使用子查询/CTE和条件聚合:

SELECT
  c.date,
  c.week_nb,
  c.id_congestion,
  c.id_element,
  ROW_NUMBER() OVER (PARTITION BY c.id_element ORDER BY c.date),
  (
    CASE WHEN seqnum = 1 THEN
      1
    ELSE
      0
    END) AS week_count
FROM (
  SELECT
    c.*,
    ROW_NUMBER() OVER (PARTITION BY c.congestion.id_element, c.week_nb ORDER BY c.date) AS seqnum
  FROM
    congestion c) c
WHERE
  c.date >= '2014.01.01'
  AND c.date <= '2014.12.31'
ORDER BY
  id_element,
  date
wydwbb8l

wydwbb8l3#

使分区集更小,直到计数字段上没有重复项为止:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
COUNT(congestion.week_nb) -- remove distinct 
OVER(
    PARTITION BY congestion.id_element,
                 -- add new fields which will restart counter in case duplication
                 congestion.id_congestion
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date
qij5mzcb

qij5mzcb4#

由于这是从Google弹出的第一个结果,我将添加这个可重复的例子,类似于Gordon的答案:
我们首先创建一个样表:

WITH test as 
(
SELECT * 
FROM (VALUES
(1, 'A'),
(1, 'A'),
(2, 'B'),
(2, 'B'),
(2, 'D'),
(3, 'C'),
(3, 'C'),
(3, 'C'),
(3, 'E'),
(3, 'F')) AS t (id_element, week_nb)
)

select * from test

这将产生:

id_element week_nb
1   A
1   A
2   B
2   B
2   D
3   C
3   C
3   C
3   E
3   F

然后,做类似这样的事情:

select 
  id_element,
  week_nb,
  sum(first_row_in_sequence) over (partition by id_element) as distinct_week_nb_count
from 
(
select 
  id_element,
  week_nb,
  case when row_number() over (partition by id_element, week_nb) = 1 then 1 else 0 end as first_row_in_sequence
from test
) as sub

收益率

id_element week_nb distinct_week_nb_count
1   A   1
1   A   1
2   B   2
2   B   2
2   D   2
3   C   3
3   C   3
3   C   3
3   E   3
3   F   3
nimxete2

nimxete25#

如果您正在计算非重复数字,则可以使用其他聚合函数来获得相同的效果,如下所示。
第一个

相关问题