apache配置单元查询hiveql

jpfvwuh4  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(463)

我正在学习hive,想编写一个优化的hiveql/sql查询
我的table是这样的:

CREATE TABLE sales (dealer VARCHAR(25), make VARCHAR(25), type VARCHAR(25), day INT);
INSERT INTO sales (dealer, make, type, day) VALUES
("Xyz", "Highlander", "SUV", "0"),
("Xyz", "Prius", "HATCH", "1"),
("Xyz", "Prius", "HATCH", "2"),
("Xyz", "Prius", "HATCH", "3"),
("Xyz", "Versa", "HATCH", "1"),
("Xyz", "Versa", "HATCH", "2"),
("Xyz", "Versa", "HATCH", "3"),
("Xyz", "S3", "SEDAN", "1"),
("Xyz", "S3", "SEDAN", "2"),
("Abc", "Forrester", "SUV", "1");

给定一个“dealer”d,我想在一个查询中计算过去x天中每个“type”的前n个“make”。

SELECT dealer, make, type, COUNT(*) AS frequency FROM sales
WHERE day > 0 AND dealer LIKE 'Xyz' GROUP BY make, type
ORDER BY frequency DESC LIMIT 5

问题是,当使用group by on“make”和“type”作为top 1时,我只会得到:

DEALER, MAKE, TYPE, COUNT
Xyz, Prius, Hatch, 3
Xyz, Versa, Hatch, 3
Xyz, S3, Sedan, 2
...

但我想

Xyz, Prius, Hatch, 3
Xyz, S3, Sedan, 2
...

对于每个“类型”,前n。
有人能帮我理解如何写这样一个查询吗?
sql函数iddlehttp://sqlfiddle.com/#!2/df9304/5号

更新

似乎rank()会有用:
通过查询获取组中前n个记录的配置单元
https://blogs.oracle.com/taylor22/entry/hive_0_11_may_15
hiveql和rank()

shyt4zoc

shyt4zoc1#

在阅读了更多的文档和链接问题的提示之后:

SELECT dealer, make, rank, type FROM (
    SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
        SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
    ) CountedSales
) RankedSales
WHERE RankedSales.rank < 3;

内部查询执行计数,中间查询执行rank(),外部查询限制rank。
销售表目录

hive> select * from Sales;
OK
Xyz      Highlander      SUV    NULL
Xyz      Highlander      SUV    NULL
Xyz      Rouge   SUV    NULL
Xyz      Rouge   SUV    NULL
Xyz      Prius   HATCH  NULL
Xyz      Prius   HATCH  NULL
Xyz      Prius   HATCH  NULL
Xyz      Versa   HATCH  NULL
Xyz      S3      SEDAN  NULL
Xyz      S3      SEDAN  NULL
Xyz      S3      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Time taken: 0.054 seconds, Fetched: 15 row(s)

现在是实际的查询。

hive> SELECT dealer, make, rank, type FROM (                                                                          
    >     SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
    >         SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
    >     ) CountedSales
    > ) RankedSales
    > WHERE RankedSales.rank < 3;
...
Execution completed successfully
MapredLocal task succeeded
OK
Xyz      Prius  1        HATCH
Xyz      Versa  2        HATCH
Xyz      A8     1        SEDAN
Xyz      S3     2        SEDAN
Xyz      Rouge  1        SUV
Xyz      Highlander     1        SUV
Time taken: 28.491 seconds, Fetched: 6 row(s)

相关问题