pyspark Spark数据框创建按订单分解

ctrmrzij  于 2022-11-01  发布在  Spark
关注(0)|答案(1)|浏览(158)

我有一个数据像下面

输入Df

+----------+-----------------------------------+--------------|
|SALES_NO   |SALE_LINE_NUM   | CODE_1   | CODE_3   | CODE_2   |
+----------+----------------------------|------+---|----------|
|123       |1                | ABC      | E456     |  GHF989  |
|123       |2                | EDF      | EFHJ     |  WAEWA   |
|234       |1                | 2345     | 985E     |  AWW     |
|234       |2                | WERWE    |          |          |
|234       |3                | ERC      | AERER    |          |
|456       |1                | WER      | AWER     |          |
+----------+-----------------------------------+--------------|

将创建如下输出:对于每个唯一的sales_no,sales_line_num,如果code不为空,则为不同的code列创建一个新行,并为相同的code列创建一个订单。
对于code_1,顺序将为1。
对于code_2,顺序为2。

输出df

SALES_NO  SALES_LINE_NUM   CODE    ORDER
123          1              ABC      1
123          1              E456     2
123          1              GHF989   3
123          2              EDF      1
123          2              EFHJ     2
123          2              WAEWA    3
234          1              2345     1
234          1              985E     2
234          1              AWW      3
234          2              WERWE    1
234          3              ERC      1
234          3              AERER    2
456          1              WER      1
456          1              AWER     2

有谁能帮忙吗?先谢谢了

7kqas0il

7kqas0il1#

对于此数据集:

var ds = spark.sparkContext.parallelize(Seq(
  (123, 1, "ABC", "E456", "GHF989"),
  (123, 2, "EDF", "EFHJ", "WAEWA"),
  (234, 1, "2345", "985E", "AWW"),
  (234, 2, "WERWE", "", ""),
  (234, 3, "ERC", "AERER", ""),
  (456, 1, "WER", "AWER", ""),
)).toDF("SALES_NO", "SALE_LINE_NUM", "CODE_1", "CODE_3", "CODE_2")

我们需要对stack进行反透视,如下所示:

ds = ds.selectExpr(
  "SALES_NO",
  "SALE_LINE_NUM",
  "stack(3, CODE_1, '1', CODE_2, '2', CODE_3, '3') as (CODE, ORDER)"
)

哪一个应该给予你想要的东西:

+--------+-------------+------+-----+
|SALES_NO|SALE_LINE_NUM|CODE  |ORDER|
+--------+-------------+------+-----+
|123     |1            |ABC   |1    |
|123     |1            |GHF989|2    |
|123     |1            |E456  |3    |
|123     |2            |EDF   |1    |
|123     |2            |WAEWA |2    |
|123     |2            |EFHJ  |3    |
|234     |1            |2345  |1    |
|234     |1            |AWW   |2    |
|234     |1            |985E  |3    |
|234     |2            |WERWE |1    |
+--------+-------------+------+-----+

有关取消透视的更多信息,请参阅here
祝你好运!

相关问题