pyspark 如何在两行中显示最高和最低数字?

uqjltbpv  于 2022-11-01  发布在  Spark
关注(0)|答案(2)|浏览(126)

我想在两行中得到最高数和最低数,但我得到了整个输出,我应该使用密集秩还是秩窗口函数?
第一个
结果属性:eco、eco_name、number_of_occurences最终结果将只有两行

7fyelxc5

7fyelxc51#

你好尝试一个with子句来存储一个查询并重复使用它,如下所示:

with my_select as
(select a.eco,b.eco_name,count(b.eco_name) as occurance
from `game`.`chess_game` as a, `game`.`chess_eco_codes` as b where a.eco=b.eco
group by a.eco,b.eco_name)
select * from my_select 
where occurance = (select max(occurance) from my_select) 
or occurance = (select min(occurance) from my_select)
50pmv0ei

50pmv0ei2#

如果您使用PySpark,您应该学习如何以Python方式编写它,而不仅仅是SQL。

from pyspark.sql import functions as F
from pyspark.sql.window import Window as W

(df
    .withColumn('rank_asc', F.dense_rank().over(W.orderBy(F.asc('number_of_occurance'))))
    .withColumn('rank_desc', F.dense_rank().over(W.orderBy(F.desc('number_of_occurance'))))
    .where((F.col('rank_asc') == 1) | (F.col('rank_desc') == 1))
    # .drop('rank_asc', 'rank_desc') # to drop these two temp columns 
    .show()
)

+---+--------------------+-------------------+--------+---------+
|eco|            eco_name|number_of_occurance|rank_asc|rank_desc|
+---+--------------------+-------------------+--------+---------+
|C42|      Petrov Defense|                 64|       9|        1|
|D27|Queen's Gambit Ac...|                 32|       1|        9|
+---+--------------------+-------------------+--------+---------+

相关问题