Pyspark-elf语句和赋值位置以提取一个值

ruarlubt  于 2022-09-21  发布在  Spark
关注(0)|答案(1)|浏览(98)

我有一个与本例类似的 Dataframe
ProductNo|prodcuctMT|ProductPR|ProductList
-|-|
|[‘XY-5’,‘YZ-12’,‘ZB-56’,‘Iu-30’]|[‘Pr-1’,‘Pr-2’,‘Pr-3’,‘Pr-4’]|[‘67230’,‘7839’,‘1339’,‘9793’]
6745|[‘XY-4’,‘YZ-34’,‘zb-8’,‘Iu-9’]|[‘Pr-6’,‘Pr-1’,‘Pr-3’,‘Pr-7’]|[‘1111’,‘0987’,‘8910’,‘0348’]

我想将Elif语句用于多个条件,其中我们查看ProductMT,如果它通过条件,它将查看ProductPR,并采取满足条件的位置。

如果ProductMT包含XY-5,则如果ProductPR包含Pr-1,则就位并添加具有来自ProductList的值的新列。
ProductNo|prodcuctMT|ProductPR|ProductList|ProductList
-|
|[‘XY-5’,‘YZ-12’,‘ZB-56’,‘Iu-30’]|[‘Pr-1’,‘Pr-2’,‘Pr-3’,‘Pr-4’]|[‘67230’,‘7839’,‘1339’,‘9793’]|67230

我尝试使用筛选器,但它只完成一个筛选器的工作,而且我需要在多个筛选器上运行,所以它循环遍历所有行和条件。

F.arrays_zip('productList', 'prodcuctMT', 'productPR'),
    lambda x: (x.prodcuctMT == 'xy-5') & (x.productPR != 'pr-1')
)
df_array_pos = df_array.withColumn('output', filtered[0].productList).withColumn('flag',  filtered[0].prodcuctMT)```
ux6nzvsh

ux6nzvsh1#

您只需为所需的每个ELIF条件使用多个when函数

您的样本数据

df = spark.createDataFrame([
    (2389, ['xy-5', 'yz-12','zb-56','iu-30'], ['pr-1', 'pr-2', 'pr-3', 'pr-4'], ['67230','7839','1339','9793']),
    (6745, ['xy-4', 'yz-34','zb-8','iu-9'], ['pr-6', 'pr-1', 'pr-3', 'pr-7'], ['1111','0987','8910','0348']),
], ['productNo', 'productMT', 'productPR', 'productList'])

+---------+---------------------------+------------------------+-------------------------+
|productNo|productMT                  |productPR               |productList              |
+---------+---------------------------+------------------------+-------------------------+
|2389     |[xy-5, yz-12, zb-56, iu-30]|[pr-1, pr-2, pr-3, pr-4]|[67230, 7839, 1339, 9793]|
|6745     |[xy-4, yz-34, zb-8, iu-9]  |[pr-6, pr-1, pr-3, pr-7]|[1111, 0987, 8910, 0348] |
+---------+---------------------------+------------------------+-------------------------+

您可以添加任意数量的when

from pyspark.sql import functions as F

(df
    .withColumn('output', F
        .when(F.array_contains('productMT', 'xy-5') & F.array_contains('productPR', 'pr-1'), F.col('productList')[F.array_position('productMT', 'xy-5') - 1])
    )
    .show(10, False)
)

+---------+---------------------------+------------------------+-------------------------+------+
|productNo|productMT                  |productPR               |productList              |output|
+---------+---------------------------+------------------------+-------------------------+------+
|2389     |[xy-5, yz-12, zb-56, iu-30]|[pr-1, pr-2, pr-3, pr-4]|[67230, 7839, 1339, 9793]|67230 |
|6745     |[xy-4, yz-34, zb-8, iu-9]  |[pr-6, pr-1, pr-3, pr-7]|[1111, 0987, 8910, 0348] |null  |
+---------+---------------------------+------------------------+-------------------------+------+

相关问题