我试图获得每个商家的用户的每日增长率,并将其作为另一列添加到我的数据框中。我已经能够想出代码了;然而,我认为这个问题是因为某些商家可能有连续几天的0销售。0-> 0没有增长率,所以我相信这就是为什么它搞乱了我的代码。有什么推荐吗?
下面是我使用的代码:抱歉,篇幅太长了
# First I filter our the dates and order category
# week
df_wk = df.loc[(df['Booked at'] >= '2023-01-01') & (df['Booked at'] <= '2023-05-31')]
#placed eats orders
df_eats = df_wk.loc[df_wk['Job Type'] == 'Eats']
# Create a dataframe with all merchants and dates within the date range
dates = pd.date_range('2023-01-01', '2023-05-31', freq='D')
merchants = df_eats['Merchant Name'].unique()
index = pd.MultiIndex.from_product([dates, merchants], names=['Booked at', 'Merchant Name'])
all_sales = pd.DataFrame(index=index).reset_index()
#including all merchants even on zero sales days and getting number of unique users per merchant and day
eats_users = df_eats.groupby(['Booked at', 'Merchant Name']).agg({'Customer Name': lambda x: x.nunique()})
eats_users.rename(columns={'Customer Name': 'Total Unique Users'}, inplace=True)
# Sort the dataframe by 'Booked at' column to ensure chronological order
eats_users.sort_values('Booked at', inplace=True)
#merge all sales with new df
unique_users = pd.merge(all_sales, eats_users, on=['Booked at', 'Merchant Name'], how='left')
unique_users.fillna(0, inplace=True)
# Sort the dataframe by 'Booked at' column to ensure chronological order
unique_users.sort_values('Booked at', inplace=True)
# Add new column that calculates the daily growth rate per merchant
previous_users = {}
for index, row in unique_users.iterrows():
current_users = row['Total Unique Users']
merchant = index[1] # Get the merchant name from the index tuple
if pd.isna(current_users):
unique_users.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
else:
if merchant in previous_users:
previous_users_count = previous_users[merchant]
growth_rate = (current_users / previous_users_count - 1) * 100
unique_users.at[index, 'Daily Growth Rate for Unique Users'] = growth_rate
else:
unique_users.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
previous_users[merchant] = current_users
预期输出如下所示:
Booked at Merchant Name Total Unique Users \
0 2023-01-01 Merchant A 38
1 2023-01-01 Merchant B 75
2 2023-01-01 Merchant C 74
3 2023-01-02 Merchant A 33
4 2023-01-02 Merchant B 71
.. ... ... ...
447 2023-05-30 Merchant A 75
449 2023-05-30 Merchant C 13
451 2023-05-31 Merchant B 76
450 2023-05-31 Merchant A 52
452 2023-05-31 Merchant C 2
Daily Growth Rate for Unique Users
0 NaN
1 NaN
2 NaN
3 -13.157895
4 -5.333333
.. ...
447 5.633803
449 -45.833333
451 13.432836
450 -30.666667
452 -84.615385
[453 rows x 4 columns]
1条答案
按热度按时间cx6n0qe31#
您既没有提供示例数据,也没有提供预期的输出,因此这里有一个关于如何做到这一点的想法。让我们从一个df示例开始:
也就是
现在,您可以将代码压缩为:
其返回:
希望这有帮助!