postgresql 查找每个国家的顶级用户数量

bfnvny8b  于 2023-05-22  发布在  PostgreSQL
关注(0)|答案(2)|浏览(95)

我有下面的表来建模一个图书数据库:

CREATE TABLE Country (
    ISO_3166 CHAR(2) PRIMARY KEY,
    CountryName VARCHAR(256),
    CID varchar(16)
);
CREATE TABLE Users (
    UID INT PRIMARY KEY,
    Username VARCHAR(256),
    DoB DATE,
    Age INT,
    ISO_3166 CHAR(2) REFERENCES Country (ISO_3166)
);
CREATE TABLE Book (
    ISBN VARCHAR(17) PRIMARY KEY,
    Title VARCHAR(256),
    Published DATE,
    Pages INT,
    Language VARCHAR(256)
);
CREATE TABLE Rating (
    UID INT REFERENCES Users (UID),
    ISBN VARCHAR(17) REFERENCES Book (ISBN),
    PRIMARY KEY (UID,ISBN),
    Rating int
);

我现在想找到那些在每个国家/地区拥有最多评级的用户。我可以使用这个查询:

SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
FROM Country
JOIN Users ON Users.ISO_3166 = Country.ISO_3166 
JOIN Rating ON Users.UID = Rating.UID
GROUP BY Country.CID, CountryName, Username
ORDER BY CountryName ASC

要以以下格式返回每个用户的评分数,请执行以下操作:

Countryname | Username | Number of Ratings of this user

我还管理了以下查询,它为每个国家提供了一个用户,但它不是评分最多的一个:

SELECT DISTINCT ON (CountryName)
        CountryName, Username, MAX(NumRatings)
FROM (
    SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
        FROM Country
        JOIN Users ON Users.ISO_3166 = Country.ISO_3166 
        JOIN Rating ON Users.UID = Rating.UID
        GROUP BY Country.CID, CountryName, Username
        ORDER BY CountryName ASC) AS MyTable
GROUP BY CountryName, Username, NumRatings 
ORDER BY CountryName ASC;

但是如何编写一个查询来挑选每个国家最多的用户呢?

qzlgjiam

qzlgjiam1#

你是如此接近:

SELECT DISTINCT ON (CountryName)
        CountryName, Username, NumRatings
FROM(
    SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
        FROM Country
        JOIN Users ON Users.ISO_3166 = Country.ISO_3166 
        JOIN Rating ON Users.UID = Rating.UID
        GROUP BY Country.CID, CountryName, Username
        ORDER BY CountryName ASC) AS MyTable
WHERE TRUE --no filtering needed 
ORDER BY CountryName ASC, NumRatings DESC

Postgres允许您排序,以确定当您正在区分的列由多个行表示时包含哪条记录。在这种情况下,按NumRatings降序排序应该为您提供每个国家/地区NumRatings值最高的行中的值。

20jt8wwn

20jt8wwn2#

DISTINCT ON很好,很容易获得一个(如单词“distinct”所暗示的)用户,每个国家的评分最高。参见:

  • 是否选择每个GROUP BY组中的第一行?

但是你想...
找到每个国家的评分最高的用户。
一个以上的国家可以有最多的收视率。
我想首先聚合评级,然后连接到users表-在CTE中。然后使用WITH TIESLATERAL子查询中为每个国家选择一个或多个获胜者:

WITH agg AS (
   SELECT u.iso_3166, u.uid, u.username, r.numratings
   FROM  (
      SELECT uid, count(*) AS numratings
      FROM   rating r
      GROUP  BY 1
      ) r
   JOIN   users u USING (uid)
   )
SELECT c.countryname, a.username, a.numratings
FROM   country c
LEFT   JOIN LATERAL (
   SELECT *
   FROM   agg a
   WHERE  a.iso_3166 = c.iso_3166
   ORDER  BY a.numratings DESC
   FETCH  FIRST 1 ROWS WITH TIES  -- !
   ) a ON true;

关于“先聚合,后加入”:

  • 在单个查询中多次调用array_agg()
  • 在generate_series()上联接计数查询并将Null值作为“0”检索

关于WITH TIES

  • 获取具有最高值的顶行,并保持联系

关于LATERAL

  • PostgreSQL中的LATERAL JOIN和subquery有什么区别?

值得注意的是,你不想GROUP BY Country.CIDcountry.ISO_3166是PK,使用它代替。(我优化了查询,所以我根本不需要GROUP BY中的国家。

相关问题