pandas中的铅填充函数使用另一列的条件

hgc7kmma  于 2023-08-01  发布在  其他
关注(0)|答案(2)|浏览(94)

我有一个数据集,其中包含MMMM-YY格式的日期索引、促销的开始日期、折扣值和促销的结束日期。
具体如下:

events = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
                         'promo_start': ['2022-01','Nan','2022-03','Nan','2022-05','2022-06','Nan','Nan','2022-09','Nan'],
                         'disc': ['0.1','Nan',0.2,'Nan',0.2,0.4,'Nan','Nan',0.5,'NaN'],
                         'promo_end': ['Nan', '2022-02','Nan','2022-04','2022-05','Nan','2022-07','Nan','Nan','2022-10']})

字符串
x1c 0d1x的数据
我尝试了各种组合groupbyffill操作,但无法产生所需的输出。
对于每一周在YYYY-MM我想能够评估促销活动是否通过做类似于铅填充操作的东西,使输出是一个布尔标志和折扣金额的dataphram,如下所示;

desired_output = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
                         'promo_start': ['2022-01','Nan','2022-03','Nan','2022-05','2022-06','Nan','Nan','2022-09','Nan'],
                         'disc': ['0.1','Nan',0.2,'Nan',0.2,0.4,'Nan','Nan',0.5,'NaN'],
                         'promo_end': ['Nan', '2022-02','Nan','2022-04','2022-05','Nan','2022-07','Nan','Nan','2022-10'],
                         'promo_active': [True,True,True,True,True,True,True,False,True,True],
                         'promo_disc': [0.1,0.1,0.2,0.2,0.2,0.4,0.4,0,0.5,0.5]})


btxsgosb

btxsgosb1#

向前填充promo_start,然后按促销开始分组promo_end,然后向后填充值,现在检查空值以确定促销是否处于活动状态

g = events['promo_start'].ffill()
events['promo_active'] = events['promo_end'].groupby(g).bfill().notna()

个字符

js4nwp54

js4nwp542#

这是一个比舒伯姆更长的方法,但也不那么抽象:

# Test data
events = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
                         'promo_start': ['2022-01','Nan','2022-03','Nan','2022-05','2022-06','Nan','Nan','2022-09','Nan'],
                         'disc': ['0.1','Nan',0.2,'Nan',0.2,0.4,'Nan','Nan',0.5,'NaN'],
                         'promo_end': ['Nan', '2022-02','Nan','2022-04','2022-05','Nan','2022-07','Nan','Nan','2022-10']})

# For loop to determine relevant periods
in_period = False
end_period = False

for i, row in events.iterrows():
    promo_start = events.at[i, "promo_start"]
    promo_end = events.at[i, "promo_end"]
    
    if promo_start != "Nan":
        in_period = True
        end_period = False
    
    if promo_end != "Nan":
        end_period=True 

    if in_period:
        events.at[i, "promo_start"] = "fill" if promo_start == "Nan" else promo_start
        events.at[i, "promo_end"] = "fill" if promo_end == "Nan" else promo_end

    in_period = False if end_period else in_period

# Fill values of relevant period
events = events.replace("fill", None)
events.promo_start = events.promo_start.ffill()
events.promo_end = events.promo_end.bfill()

# Create new column
def is_promo_active(row):
    return (row["yyyyww"] >= row["promo_start"]) and (row["yyyyww"] <= row["promo_end"])

events["promo_active"] = events.apply(lambda row: is_promo_active(row), axis=1)

字符串

相关问题