在pyspark/python中将一列转换为多列

up9lanfz  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(508)

在这一行中,我发现了与我类似的情况,但他使用的是sql server,而不是pyspark/python:基于单个列旋转多个列
我的日期设置如下:

ID       Date             Class
1       2021/01/01        math, english
1       2021/01/02        math, english
1       2021/01/03        chinese
1       2021/01/04        math, chemistry
1       2021/01/05        math, english
1       2021/01/06        Chinese
2       2021/01/01        PE
2       2021/01/02        math, chinese
2       2021/01/03        math, english
2       2021/01/04        math, chinese
.......

期望输出应为:

ID       Date_1             schedule_1       Date_2       schedule_2     Date_3      schedule_3
1       2021/01/01        math, english      2021/01/03    chinese      2021/01/05   math, chemistry... 
1       2021/01/02        math, english      2021/01/06    chinese...
1       2021/01/05        math, english....
2       2021/01/01        PE                 2021/01/02    math, chinese     2021/01/03  math, english
2                                            2021/01/04    math, chinese

我计划使用pivot和groupby,这是我当前的代码,它不起作用,我不知道如何解决它。

line2 = line\
.select("ID")\
.groupBy("ID","Class")\
    .pivot("Date","Class")\
    .agg(expr("coalesce(first(Class), \" \")"))

任何帮助或想法或建议将不胜感激。

4urapxun

4urapxun1#

有点棘手,需要更多的争论:

import pyspark.sql.functions as F
from pyspark.sql import Window

df2 = df.withColumn(
    'rn',
    F.row_number().over(Window.partitionBy('ID', 'Class').orderBy('Date'))
).withColumn(
    'mindate',
    F.min('Date').over(Window.partitionBy('ID', 'Class'))
).withColumn(
    'rn2',
    F.dense_rank().over(Window.partitionBy('ID').orderBy('mindate'))
).groupBy('ID', 'rn').pivot('rn2').agg(
    F.first(F.struct('Date', 'Class'))
).orderBy('ID', 'rn')

df3 = df2.select(
    'ID',
    *[f'{c}.*' for c in df2.columns[2:]]
)

df3.show(truncate=False)
+---+----------+-------------+----------+-------------+----------+---------------+
|ID |Date      |Class        |Date      |Class        |Date      |Class          |
+---+----------+-------------+----------+-------------+----------+---------------+
|1  |2021/01/01|math, english|2021/01/03|chinese      |2021/01/04|math, chemistry|
|1  |2021/01/02|math, english|2021/01/06|chinese      |null      |null           |
|1  |2021/01/05|math, english|null      |null         |null      |null           |
|2  |2021/01/01|PE           |2021/01/02|math, chinese|2021/01/03|math, english  |
|2  |null      |null         |2021/01/04|math, chinese|null      |null           |
+---+----------+-------------+----------+-------------+----------+---------------+

相关问题