pyspark 根据开始日期和结束日期列按月数将列值划分为多行

yrdbyhpb  于 12个月前  发布在  Spark
关注(0)|答案(2)|浏览(87)

我想将数量值分成多行,并按开始日期和结束日期列的月数划分。每一行都应该有月份的开始日期和结束日期。我也希望剩余数量的基础上以前的价值。下面是我的输入和输出示例。

yruzcnhs

yruzcnhs1#

根据我对输入和输出的理解,这里有一个例子

data_sdf. \
    withColumn('seq', 
               func.expr('sequence(trunc(start, "month"), last_day(end), interval 1 month)')
               ). \
    withColumn('seq_st_end', 
               func.transform('seq', 
                              lambda x, i: func.struct(x.alias('start'), 
                                                       func.last_day(x).alias('end'), 
                                                       (func.col('qty')/func.size('seq')).alias('qty'),
                                                       (func.col('qty') - ((func.col('qty')/func.size('seq')) * (i+1))).alias('remaining_qty')
                                                       )
                              )
               ). \
    selectExpr('inline(seq_st_end)'). \
    show(truncate=False)

# +----------+----------+-----+-------------+
# |start     |end       |qty  |remaining_qty|
# +----------+----------+-----+-------------+
# |2023-01-01|2023-01-31|400.0|1600.0       |
# |2023-02-01|2023-02-28|400.0|1200.0       |
# |2023-03-01|2023-03-31|400.0|800.0        |
# |2023-04-01|2023-04-30|400.0|400.0        |
# |2023-05-01|2023-05-31|400.0|0.0          |
# +----------+----------+-----+-------------+

我们可以使用sequence在start & end中创建一个month-date数组。使用数组,我们可以transform它来计算每个月的开始和结束日期,以及数量列。

iaqfqrcu

iaqfqrcu2#

使用SQL将列值拆分为多行

SET DATEFORMAT dmy;

DECLARE @dateranges TABLE( BegDate DATE, EndDate DATE,Qty int)

INSERT INTO @dateranges( BegDate, EndDate,qty)
VALUES( '01-01-2023',   '31-05-2023', 2000)

;WITH CTE AS    
(

    --initial part

    SELECT  BegDate AS StartOfMonth, DATEADD(DD, -1, DATEADD(MM, 1, BegDate)) AS EndOfMonth, EndDate,qty/datediff(month,begdate,dateadd(month,1,enddate)) as quantity, qty - qty/datediff(month,begdate,dateadd(month,1,enddate)) as rem_quantity
    FROM @dateranges

    -- recursive part
    UNION ALL

    SELECT  DATEADD(MM, 1, StartOfMonth) AS StartOfMonth,  
        DATEADD(DD, -1, DATEADD(MM, 2, StartOfMonth)) AS EndOfMonth,  
        EndDate, Quantity, Rem_quantity - Quantity as Rem_quantity  
    FROM CTE  
    WHERE DATEADD(MM, 1, StartOfMonth)< EndDate  
)
SELECT  StartOfMonth, EndOfMonth, Quantity, rem_quantity
FROM CTE

相关问题