使用sparksql窗口函数获取每个窗口的行数

kb5ga3dv  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(753)

我想使用sparksql窗口函数来做一些聚合和窗口处理。
假设我使用这里提供的示例表a:https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

我想运行查询,给我的最大2收入为每个类别,也为每个类别的产品计数。
在我运行这个查询之后

SELECT
  product,
  category,
  revenue
FROM (
  SELECT
    product,
    category,
    revenue,
    dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
    count(*) OVER (PARTITION BY category ORDER BY revenue DESC) as count
  FROM productRevenue) tmp
WHERE
  rank <= 2

我的table是这样的:

product category    revenue count
pro2    tablet  6500    1
mini    tablet  5500    2

而不是

product category    revenue count
pro2    tablet  6500    5
mini    tablet  5500    5

这正是我所期望的。
我应该如何编写代码来获得每个类别的正确计数(而不是使用另一个单独的groupby语句)?

qybjjes1

qybjjes11#

Spark 具有order by window的if window子句默认为 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW .
为您的案件添加 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 在count()窗口子句中。 Try with: ```
SELECT
product,
category,
revenue,count
FROM (
SELECT
product,
category,
revenue,
dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank,
count(
) OVER (PARTITION BY category ORDER BY revenue DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as count
FROM productRevenue) tmp
WHERE
rank <= 2

ktca8awb

ktca8awb2#

改变 count(*) OVER (PARTITION BY category ORDER BY revenue DESC) as countcount(*) OVER (PARTITION BY category ORDER BY category DESC) as count . 你会得到预期的结果。
试试下面的代码。

scala> spark.sql("""SELECT
     |   product,
     |   category,
     |   revenue,
     |   rank,
     |   count
     | FROM (
     |   SELECT
     |     product,
     |     category,
     |     revenue,
     |     dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank,
     |     count(*) OVER (PARTITION BY category ORDER BY category DESC) as count
     |   FROM productRevenue) tmp
     | WHERE
     |   tmp.rank <= 2 """).show(false)

+----------+----------+-------+----+-----+
|product   |category  |revenue|rank|count|
+----------+----------+-------+----+-----+
|Pro2      |tablet    |6500   |1   |5    |
|Mini      |tablet    |5500   |2   |5    |
|Thin      |cell phone|6000   |1   |5    |
|Very thin |cell phone|6000   |1   |5    |
|Ultra thin|cell phone|5000   |2   |5    |
+----------+----------+-------+----+-----+

相关问题