在np.selectnumpy和pandas中,一次在一行上应用www.example.com()的正确方法是什么?

csbfibhn  于 2023-06-04  发布在  其他
关注(0)|答案(1)|浏览(109)

我有一种给每个零售商打分的方法,零售商应该有一个分数,稍后将被聚类,但我需要根据每个零售商的标记target为他打分。有两个目标:

  1. balanced这是一个基于多个标准的综合得分,我现在将在代码中显示
  2. nmv根据零售商的nmv有多高来定位零售商。
    下面是代码和我尝试的内容:
targets = ['balanced','nmv'] 

day_of_month = date.today().day

df['Score'] = 0

if day_of_month > 10: #If today is greater than the 10th day, do the dynamic targeting. Else, do the first 10 days plan

    for index, row in df.iterrows():
        target = row['target']

        if target == 'balanced':
            conditions = [
                (df['retailer_id'].isin(droppers['retailer_id'])), # Dropped From MP 

                (df['months_sr'] > 0.4) | (df['historical_sr'] > 0.4) & (df['orders_this_month_total'] >= 1), 
                
                (df['wallet_amount'] > 0) & (df['orders_this_month_total'] > 0), #Has Wallet Amount and still made no orders this month

                (df['orders_this_month_total'] == 1), # Ordered Once this month,
                
                ( (df[['nmv_this_month_total','nmv_one_month_ago_total','nmv_two_months_ago_total','nmv_three_months_ago_total']].fillna(0).pct_change(axis = 1).mean(axis = 1) ) > 0), # His nmv is making progress

                (df['skus_pct_change_q_cut'].isin(['med','high','extreme'])), # His orders are more likely to contain more than 3 SKUs

                (df['orders_one_month_ago_total'] >= 1) & (df['orders_this_month_total'] <= 1), # Ordered once this month or not at all and ordered last month once or more.

                (df[['orders_one_month_ago_total','orders_two_months_ago_total','orders_three_months_ago_total']].sum(axis = 1) > 0) & (df['orders_this_month_total'] >= 1), # Ordered At least in one of the previous three months and made one order this month

                (df[['orders_one_month_ago_total','orders_two_months_ago_total','orders_three_months_ago_total']].sum(axis = 1) > 0) & (df['orders_this_month_total'] <= 1), # Ordered At least in one of the previous three months and made none orders this month

                (df['sessions_this_month'] > 0) & (df['visits_this_month'] == 0), # Opens the app and we did not pay him a visit.
                
                (df['visits_this_month'] == 0) & (df['peak_week'] == wom) & ((df['months_sr'] >= 0.4) & (df['months_sr'] <= 1)) & (df['orders_this_month_total'] < 4), # This week is his peak week and he made less than 4 orders

                (df['peak_week']  < wom) & (df['orders_this_month_total'] == 0), # Missed their critical week

                (df['wallet_amount'] > 0),

                True
            ]
            results = list(range(len(conditions) - 1, -1, -1))  # define results for balanced target
        
        elif target == 'nmv':
            
            conditions = [
                (df['retailer_id'].isin(droppers['retailer_id'])), # Dropped From MP 
                        
                (df['visits_this_month'] == 0) & (df['peak_week'] == wom) & ((df['months_sr'] >= 0.4) & (df['months_sr'] <= 1)) & (df['orders_this_month_total'] == 0), # This week is his peak week 

                (df['visits_this_month'] == 0) & (df['historical_sr'] >= 0.4) & (df['orders_this_month_total'] == 0), # Overall Strike Rate is greater than 40%

                (df['nmv_q_cut_total'].isin(['high','extreme'])),
                
                (df['nmv_q_cut_total'].isin(['high','extreme'])) & ( (df['wallet_amount'] > 0) | (df['n_offers'] > 0) ),

                (df['months_nmv'].median() >= df['polygon_average_nmv']),

                (df['orders_one_month_ago'] > 0),

                (df['months_sessions_q_cut'] > 0),

                True
            ]
            results = list(range(len(conditions) - 1, -1, -1)) # define results for activation target

        df.loc[index, 'Score'] = np.select(conditions, results)
        df['Score'] = df['Score'].astype(int)

else:

    conditions = [
        (df['retailer_id'].isin(droppers['retailer_id'])), # Dropped From MP 
                
        (df['visits_this_month'] == 0) & (df['peak_week'] == wom) & ((df['months_sr'] >= 0.4) & (df['months_sr'] <= 1)), # This week is his peak week 

        (df['historical_sr'] >= 0.4), # Overall Strike Rate is greater than 40%
        
        (df['orders_one_month_ago'].isin([1,2,3,4])) & (df['nmv_one_month_ago'] >= 1500), 

        (df['orders_one_month_ago'].isin([1,2,3,4])),

        (df['orders_two_months_ago'].isin([1,2,3,4])),

        (df['orders_three_months_ago'].isin([1,2,3,4])),

        (df['last_visit_date'].dt.year == 2022) & (df['last_order_date'].dt.year == 2022), # Last Order Date And last Visit Date is in 2022

        (df['last_visit_date'].dt.year == 2023) & (df['last_order_date'].dt.year == 2023),
        
        True
    ]

    results = list(range(len(conditions) - 1, -1, -1))

    df['Score'] = np.select(conditions, results)

正如你所看到的,我给每个零售商一个分数,它以前工作过,我认为如果我迭代 Dataframe 的行并分配一个分数,它会给予我这个特定目标下零售商的最终分数。然而,它从错误中返回一个列表(我想):
ValueError:在使用可迭代对象进行设置时,必须具有相等的len键和值
你能告诉我在单独的行上使用np.select的正确方法吗?

g52tjvyc

g52tjvyc1#

据我所知,你的分数只是适用条件的反向指数。我将Score设置为None作为默认值,然后从最不重要到最重要依次应用规则,分别设置每个Score
我认为问题是你在迭代整个 Dataframe ,而这似乎是不必要的。由于没有提供示例数据,我只能给予一个我会做什么的虚拟示例:

import pandas as pd

data = {'A': range(20), 'target': ['balanced', 'nmv']*10, 'B': [1,2,3,4]*5}
df = pd.DataFrame(data)

df['score'] = None
df.loc[(df['target'] == 'nmv') & (df['B'] % 2 == 0), 'score'] = 7
df.loc[(df['target'] == 'nmv') & (df['B'] % 2 == 1), 'score'] = 8 # not hit in this scenario
df.loc[(df['target'] == 'balanced') & (df['B'] == 1), 'score'] = 9
df.loc[(df['target'] == 'balanced') & (df['B'] == 3), 'score'] = 10

print(df)

产量:

A    target  B score
0    0  balanced  1     9
1    1       nmv  2     7
2    2  balanced  3    10
3    3       nmv  4     7
4    4  balanced  1     9
5    5       nmv  2     7
6    6  balanced  3    10
7    7       nmv  4     7
8    8  balanced  1     9
9    9       nmv  2     7
10  10  balanced  3    10
11  11       nmv  4     7
12  12  balanced  1     9
13  13       nmv  2     7
14  14  balanced  3    10
15  15       nmv  4     7
16  16  balanced  1     9
17  17       nmv  2     7
18  18  balanced  3    10
19  19       nmv  4     7

保持此顺序可确保较高的分数将覆盖较低的分数。未被任何规则命中的条目将为None,但可以通过初始化为0轻松设置为0。
为了更好地帮助您,如果您能提供示例数据,将是有益的。

相关问题