postgresql Postgres:查找数组元素的共现

dgtucam1  于 2023-08-04  发布在  PostgreSQL
关注(0)|答案(2)|浏览(184)

我在postgres中有一个文本数组字段,我们可以称之为items。我想创建一个查询,它将告诉我这些项中的每一项彼此同时出现的次数。
对于行的示例集:

items
-----
{'a', 'c'}
{'a', 'b', 'c'}
{'a', 'c'}
{'a', 'b', 'c'}

字符串
以下是使用:分隔项目名称和同现次数的示例输出:

item|co_occurrences
-------------------
a   |{c:4,b:2}
b   |{a:2,c:2}
c   |{a:4,b:2}


item列列出各个项目。co_occurences列是一个文本元素数组,它组合了共现项和计数。什么查询将产生此结果?

uxhixvfz

uxhixvfz1#

我没有像@MikeOrganek那样为自连接生成行id,而是只需要unnest两次就可以生成一个具有重复元组的关系:

SELECT x AS item, json_object_agg(y, count ORDER BY count DESC) AS co_occurences
FROM (
  SELECT x, y, count(*)
  FROM example, unnest(items) AS x, unnest(items) AS y
  WHERE x <> y
  GROUP BY x, y
) tmp
GROUP BY x;

字符串
online demo

zzwlnbp8

zzwlnbp82#

首先将其转换为正常的关系形式。
以下代码为任意行id赋值:

with create_ids as (
  select row_number() over (order by items) as id,
         items
    from item_groups
), normalize as (
  select i.id, u.item
    from create_ids i
         cross join lateral unnest(i.items) as u(item)
), correlate as (
  select a.item, b.item as coitem, count(b.item) as occurence_count
    from normalize a
         left join normalize b
           on b.id = a.id and b.item != a.item
   group by a.item, b.item
)fiddle
select item, jsonb_object_agg(coitem, occurence_count) as co_occurences
  from correlate 
 group by item;

字符串
工作fiddle
如果目标不一定是JSON,那么我们可以构造一个字符串。这允许对字符串中的值进行排序:

with create_ids as (
  select row_number() over (order by items) as id,
         items
    from item_groups
), normalize as (
  select i.id, u.item
    from create_ids i
         cross join lateral unnest(i.items) as u(item)
), correlate as (
  select a.item, b.item as coitem, count(b.item) as occurence_count
    from normalize a
         left join normalize b
           on b.id = a.id and b.item != a.item
   group by a.item, b.item
)
select item, array_agg(coitem||':'||occurence_count order by occurence_count desc) as co_occurences
  from correlate 
 group by item;


更新fiddle

相关问题