scala 透视多列和一列中的条件

cxfofazt  于 2022-12-23  发布在  Scala
关注(0)|答案(1)|浏览(112)

我需要在spark scala中透视多个列,如下所示:Input DataframeOutput Dataframe

  • 1点到6点之间谢谢

尝试不同的透视,但没有成功结果

bfnvny8b

bfnvny8b1#

我不确定通过透视的解决方案,但这里有另一个 * 开箱即用 * 的解决方案。
假设data包含:

+------+-----+----+
|Rax   |RSECT|rste|
+------+-----+----+
|CUST1 |1    |aa  |
|CUST2 |2    |aa  |
|CUST3 |3    |aa  |
|CUST4 |4    |aa  |
|CUST5 |5    |aa  |
|CUST6 |6    |aa  |
|CUST7 |1    |bb  |
|CUST8 |2    |bb  |
|CUST9 |3    |bb  |
|CUST10|4    |bb  |
|CUST11|5    |bb  |
|CUST12|6    |bb  |
+------+-----+----+

我们可以使用groupBycollect_list,最后使用selectExpr来提取值,如下所示:

data
  .groupBy("rste")
  .agg(collect_list(array("Rax", "RSECT")).as("array"))
  .selectExpr(
    Array("rste") ++ expressions: _*
  )

其中expressions可以创建为:

val nrElements = 6 // or you can aggregate and collect this to be your maximum number, whatever you need
var expressions = Array[String]()
for (i <- 0 until nrElements) {
  expressions = expressions :+ 
    s"array[$i][0] as Rax${i + 1}" :+ 
    s"array[$i][1] as RSECT${i + 1}"
}

上述语句将生成以下输出:

Array(array[0][0] as Rax1, array[0][1] as RSECT1, array[1][0] as Rax2, array[1][1] as RSECT2, array[2][0] as Rax3, array[2][1] as RSECT3, array[3][0] as Rax4, array[3][1] as RSECT4, array[4][0] as Rax5, array[4][1] as RSECT5, array[5][0] as Rax6, array[5][1] as RSECT6)

我们可以将其用作SQL表达式。
完整的输出如下所示:

+----+-----+------+-----+------+-----+------+------+------+------+------+------+------+
|rste|Rax1 |RSECT1|Rax2 |RSECT2|Rax3 |RSECT3|Rax4  |RSECT4|Rax5  |RSECT5|Rax6  |RSECT6|
+----+-----+------+-----+------+-----+------+------+------+------+------+------+------+
|aa  |CUST1|1     |CUST2|2     |CUST3|3     |CUST4 |4     |CUST5 |5     |CUST6 |6     |
|bb  |CUST7|1     |CUST8|2     |CUST9|3     |CUST10|4     |CUST11|5     |CUST12|6     |
+----+-----+------+-----+------+-----+------+------+------+------+------+------+------+

祝你好运!

相关问题