scala—编写代码,显示每个部门最高工资员工和其他员工之间的工资差异

ijxebb2r  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(413)

(工资最高的员工比其他同事多拿多少)按部门排序结果,并说明与最高工资的差异。输入数据:

val inputDf = Seq(
  (1, "Hunter Fields", "IT", 15),
  (2, "Leonard Lewis", "Support", 81),
  (3, "Jason Dawson", "Support", 90),
  (4, "Andre Grant", "Support", 25),
  (5, "Earl Walton", "IT", 40),
  (6, "Alan Hanson", "IT", 24),
  (7, "Clyde Matthews", "Support", 31),
  (8, "Josephine Leonard", "Support", 1),
  (9, "Owen Boone", "HR", 27),
  (10, "Max McBride", "IT", 75)
).toDF("id", "name", "department", "salary")
println("Input:")
inputDf.show(false)

我的解决方案是:

inputDf
import org.apache.spark.sql.expressions.Window
val windowSpec = Window
  .partitionBy($"department")
  .orderBy($"salary")
inputDf
  .withColumn("diff", first(col("salary")).over(windowSpec)-$"salary")
  .withColumn("diff", last(col("salary")).over(windowSpec)-$"salary")

输入:

+---+-----------------+----------+------+
|id |name             |department|salary|
+---+-----------------+----------+------+
|1  |Hunter Fields    |IT        |15    |
|2  |Leonard Lewis    |Support   |81    |
|3  |Jason Dawson     |Support   |90    |
|4  |Andre Grant      |Support   |25    |
|5  |Earl Walton      |IT        |40    |
|6  |Alan Hanson      |IT        |24    |
|7  |Clyde Matthews   |Support   |31    |
|8  |Josephine Leonard|Support   |1     |
|9  |Owen Boone       |HR        |27    |
|10 |Max McBride      |IT        |75    |
+---+-----------------+----------+------+

预期:

+---+-----------------+----------+------+----+
|id |name             |department|salary|diff|
+---+-----------------+----------+------+----+
|9  |Owen Boone       |HR        |27    |0   |
|1  |Hunter Fields    |IT        |15    |60  |
|6  |Alan Hanson      |IT        |24    |51  |
|5  |Earl Walton      |IT        |40    |35  |
|10 |Max McBride      |IT        |75    |0   |
|8  |Josephine Leonard|Support   |1     |89  |
|4  |Andre Grant      |Support   |25    |65  |
|7  |Clyde Matthews   |Support   |31    |59  |
|2  |Leonard Lewis    |Support   |81    |9   |
|3  |Jason Dawson     |Support   |90    |0   |
+---+-----------------+----------+------+----+

但目前我的实际结果是不正确的,因为diff column是0

avwztpqn

avwztpqn1#

应将窗口顺序改为降序:

val windowSpec = Window.partitionBy($"department").orderBy($"salary".desc)

然后:

scala> inputDf.withColumn("diff", first(col("salary")).over(windowSpec)-$"salary").show()
+---+-----------------+----------+------+----+
| id|             name|department|salary|diff|
+---+-----------------+----------+------+----+
|  9|       Owen Boone|        HR|    27|   0|
| 10|      Max McBride|        IT|    75|   0|
|  5|      Earl Walton|        IT|    40|  35|
|  6|      Alan Hanson|        IT|    24|  51|
|  1|    Hunter Fields|        IT|    15|  60|
|  3|     Jason Dawson|   Support|    90|   0|
|  2|    Leonard Lewis|   Support|    81|   9|
|  7|   Clyde Matthews|   Support|    31|  59|
|  4|      Andre Grant|   Support|    25|  65|
|  8|Josephine Leonard|   Support|     1|  89|
+---+-----------------+----------+------+----+

相关问题