pandas 改善代码的每日增长率的订单每个商家?

w46czmvw  于 2023-05-12  发布在  其他
关注(0)|答案(1)|浏览(110)

我试图获得每个商家的用户的每日增长率,并将其作为另一列添加到我的数据框中。我已经能够想出代码了;然而,我认为这个问题是因为某些商家可能有连续几天的0销售。0-> 0没有增长率,所以我相信这就是为什么它搞乱了我的代码。有什么推荐吗?
下面是我使用的代码:抱歉,篇幅太长了

# First I filter our the dates and order category
# week
df_wk = df.loc[(df['Booked at'] >= '2023-01-01') & (df['Booked at'] <= '2023-05-31')]

#placed eats orders
df_eats = df_wk.loc[df_wk['Job Type'] == 'Eats']

# Create a dataframe with all merchants and dates within the date range
dates = pd.date_range('2023-01-01', '2023-05-31', freq='D')
merchants = df_eats['Merchant Name'].unique()
index = pd.MultiIndex.from_product([dates, merchants], names=['Booked at', 'Merchant Name'])
all_sales = pd.DataFrame(index=index).reset_index()

#including all merchants even on zero sales days and getting number of unique users per merchant and day
eats_users = df_eats.groupby(['Booked at', 'Merchant Name']).agg({'Customer Name': lambda x: x.nunique()})
eats_users.rename(columns={'Customer Name': 'Total Unique Users'}, inplace=True)

# Sort the dataframe by 'Booked at' column to ensure chronological order
eats_users.sort_values('Booked at', inplace=True)

#merge all sales with new df
unique_users = pd.merge(all_sales, eats_users, on=['Booked at', 'Merchant Name'], how='left')
unique_users.fillna(0, inplace=True)

# Sort the dataframe by 'Booked at' column to ensure chronological order
unique_users.sort_values('Booked at', inplace=True)

# Add new column that calculates the daily growth rate per merchant
previous_users = {}
for index, row in unique_users.iterrows():
    current_users = row['Total Unique Users']
    merchant = index[1]  # Get the merchant name from the index tuple
    if pd.isna(current_users):
        unique_users.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
    else:
        if merchant in previous_users:
            previous_users_count = previous_users[merchant]
            growth_rate = (current_users / previous_users_count - 1) * 100
            unique_users.at[index, 'Daily Growth Rate for Unique Users'] = growth_rate
        else:
            unique_users.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
        previous_users[merchant] = current_users

预期输出如下所示:

Booked at Merchant Name  Total Unique Users  \
0   2023-01-01    Merchant A                  38   
1   2023-01-01    Merchant B                  75   
2   2023-01-01    Merchant C                  74   
3   2023-01-02    Merchant A                  33   
4   2023-01-02    Merchant B                  71   
..         ...           ...                 ...   
447 2023-05-30    Merchant A                  75   
449 2023-05-30    Merchant C                  13   
451 2023-05-31    Merchant B                  76   
450 2023-05-31    Merchant A                  52   
452 2023-05-31    Merchant C                   2   

     Daily Growth Rate for Unique Users  
0                                   NaN  
1                                   NaN  
2                                   NaN  
3                            -13.157895  
4                             -5.333333  
..                                  ...  
447                            5.633803  
449                          -45.833333  
451                           13.432836  
450                          -30.666667  
452                          -84.615385  

[453 rows x 4 columns]
cx6n0qe3

cx6n0qe31#

您既没有提供示例数据,也没有提供预期的输出,因此这里有一个关于如何做到这一点的想法。让我们从一个df示例开始:

import pandas as pd
import numpy as np

dates = pd.date_range('2023-01-01', '2023-05-31')
merchants = ['Merchant A', 'Merchant B', 'Merchant C']
data = []
for date in dates:
    for merchant in merchants:
        sales = np.random.randint(0, 100)
        data.append((date, merchant, sales))

df = pd.DataFrame(data, columns=['Booked at', 'Merchant Name', 'Total Unique Users'])

df.sort_values('Booked at', inplace=True)
print(df)

也就是

Booked at Merchant Name  Total Unique Users
0   2023-01-01    Merchant A                  38
1   2023-01-01    Merchant B                  75
2   2023-01-01    Merchant C                  74
3   2023-01-02    Merchant A                  33
4   2023-01-02    Merchant B                  71
..         ...           ...                 ...
447 2023-05-30    Merchant A                  75
449 2023-05-30    Merchant C                  13
451 2023-05-31    Merchant B                  76
450 2023-05-31    Merchant A                  52
452 2023-05-31    Merchant C                   2

[453 rows x 3 columns]

现在,您可以将代码压缩为:

previous_users = {}
for index, row in df.iterrows():
    current_users = row['Total Unique Users']
    merchant = row['Merchant Name']
    if pd.isna(current_users):
        df.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
    else:
        if merchant in previous_users and previous_users[merchant] != 0:
            previous_users_count = previous_users[merchant]
            growth_rate = (current_users / previous_users_count - 1) * 100
            df.at[index, 'Daily Growth Rate for Unique Users'] = growth_rate
        else:
            df.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
        previous_users[merchant] = current_users

print(df)

其返回:

Booked at Merchant Name  Total Unique Users  \
0   2023-01-01    Merchant A                  38   
1   2023-01-01    Merchant B                  75   
2   2023-01-01    Merchant C                  74   
3   2023-01-02    Merchant A                  33   
4   2023-01-02    Merchant B                  71   
..         ...           ...                 ...   
447 2023-05-30    Merchant A                  75   
449 2023-05-30    Merchant C                  13   
451 2023-05-31    Merchant B                  76   
450 2023-05-31    Merchant A                  52   
452 2023-05-31    Merchant C                   2   

     Daily Growth Rate for Unique Users  
0                                   NaN  
1                                   NaN  
2                                   NaN  
3                            -13.157895  
4                             -5.333333  
..                                  ...  
447                            5.633803  
449                          -45.833333  
451                           13.432836  
450                          -30.666667  
452                          -84.615385  

[453 rows x 4 columns]

希望这有帮助!

相关问题