Python Pandas -如何将市场和渠道的月度目标扩展为每日目标?

4c8rllxm  于 2023-03-11  发布在  Python
关注(0)|答案(1)|浏览(74)

很长一段时间以来,我一直在维护一个报告,显示不同市场和渠道的目标进度,我一直依赖于GoogleSheets中的一些功能(最常见的是拆分和扁平化),获取月度预算并将其拆分为每日目标,以便与来自另一个系统的数据相结合以获得每日计数,然后在Tableau Desktop中将其聚合到所需的任何时间段(例如,按周、月、年)。添加新市场、渠道等非常挑剔,而且Google Sheets已经变得太大,无法与Tableau连接。我想使用Python来简化这一过程。
我一直在研究的解决方案使用Python的Pandas库来拉入一个Excel文件,其中每行包含市场、渠道和KPI,每个月有一列,实际目标作为值。我可以使用pd.melt将其解透视为一个更表格化的视图,但我还没有找到任何解决方案允许我将每个月扩展为天。其中,每一天都有与该月天数成比例的目标的一部分,同时保留KPI、市场和渠道。

df = pd.DataFrame([['New', 'Albuquerque', 'Marketing', 34, 34, 34, 35, 35, 36, 36, 36, 37, 40, 40, 40],
                   ['New', 'Boston', 'Marketing', 12, 12, 12, 12, 12, 13, 13, 14, 14, 15, 16, 17],
                   ['Converted', 'Albuquerque', 'Marketing', 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
                   ['Converted', 'Boston', 'Marketing', 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],
                  columns=['KPI',
                           'Market',
                           'Channel',
                           '2022-01-01',
                           '2022-02-01',
                           '2022-03-01',
                           '2022-04-01',
                           '2022-05-01',
                           '2022-06-01',
                           '2022-07-01',
                           '2022-08-01',
                           '2022-09-01',
                           '2022-10-01',
                           '2022-11-01',
                           '2022-12-01'])

# Set up variables for the melt
index_vars = ['KPI','Market','Channel']
val_vars = df.set_index(index_vars).columns.tolist()

# Unpivot months
df = pd.melt(df,
             id_vars=index_vars,
             value_vars=val_vars,
             var_name='Date',
             value_name='Goal',
             ignore_index=False)

# Force dates to datetime, sort and reset index for a clean view
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df = df.sort_values(by=['KPI','Market','Channel','Date']).reset_index(drop=True)

print(df)

这给了我这样的一个视图:

KPI       Market    Channel       Date  Goal
0   Converted  Albuquerque  Marketing 2022-01-01     5
1   Converted  Albuquerque  Marketing 2022-02-01     5
2   Converted  Albuquerque  Marketing 2022-03-01     5
3   Converted  Albuquerque  Marketing 2022-04-01     5
4   Converted  Albuquerque  Marketing 2022-05-01     5
5   Converted  Albuquerque  Marketing 2022-06-01     5
6   Converted  Albuquerque  Marketing 2022-07-01     5
7   Converted  Albuquerque  Marketing 2022-08-01     5
8   Converted  Albuquerque  Marketing 2022-09-01     5
9   Converted  Albuquerque  Marketing 2022-10-01     5
10  Converted  Albuquerque  Marketing 2022-11-01     5
11  Converted  Albuquerque  Marketing 2022-12-01     5
12  Converted       Boston  Marketing 2022-01-01     2
13  Converted       Boston  Marketing 2022-02-01     2
...

我试着让它吐出这样的东西:

KPI       Market    Channel       Date  Goal
0   Converted  Albuquerque  Marketing 2022-01-01     0.1612903226
1   Converted  Albuquerque  Marketing 2022-01-02     0.1612903226
2   Converted  Albuquerque  Marketing 2022-01-03     0.1612903226
3   Converted  Albuquerque  Marketing 2022-01-04     0.1612903226
4   Converted  Albuquerque  Marketing 2022-01-05     0.1612903226
5   Converted  Albuquerque  Marketing 2022-01-06     0.1612903226
6   Converted  Albuquerque  Marketing 2022-01-07     0.1612903226
7   Converted  Albuquerque  Marketing 2022-01-08     0.1612903226
8   Converted  Albuquerque  Marketing 2022-01-09     0.1612903226
9   Converted  Albuquerque  Marketing 2022-01-10     0.1612903226
10  Converted  Albuquerque  Marketing 2022-01-11     0.1612903226
11  Converted  Albuquerque  Marketing 2022-01-12     0.1612903226
12  Converted       Boston  Marketing 2022-01-13     0.064516129
13  Converted       Boston  Marketing 2022-01-14     0.064516129
...
rm5edbpk

rm5edbpk1#

melt之前的原始 Dataframe 开始可能更容易:

# Keep only date columns and set others to index
out = df.set_index(index_vars)
out.columns = pd.to_datetime(out.columns)

# Expand months to days
new_idx = pd.date_range(out.columns.min(), out.columns.max() + pd.offsets.MonthEnd(0), freq='D')

# Compute the new goal according days in month
out /= out.columns.days_in_month

# Reindex with the days index and fill missing values
out = out.reindex(new_idx, axis=1).ffill(axis=1)

中间输出为:

>>> out
                                 2022-01-01  2022-01-02  2022-01-03  ...  2022-12-29  2022-12-30  2022-12-31
KPI       Market      Channel                                        ...                                    
New       Albuquerque Marketing    1.096774    1.096774    1.096774  ...    1.290323    1.290323    1.290323
          Boston      Marketing    0.387097    0.387097    0.387097  ...    0.548387    0.548387    0.548387
Converted Albuquerque Marketing    0.161290    0.161290    0.161290  ...    0.161290    0.161290    0.161290
          Boston      Marketing    0.064516    0.064516    0.064516  ...    0.064516    0.064516    0.064516

[4 rows x 365 columns]

但是,如果需要预期的输出,可以用途:

out = out.rename_axis(columns='Date').stack().to_frame('Goal').reset_index()

最终输出:

>>> out
            KPI       Market    Channel       Date      Goal
0           New  Albuquerque  Marketing 2022-01-01  1.096774
1           New  Albuquerque  Marketing 2022-01-02  1.096774
2           New  Albuquerque  Marketing 2022-01-03  1.096774
3           New  Albuquerque  Marketing 2022-01-04  1.096774
4           New  Albuquerque  Marketing 2022-01-05  1.096774
...         ...          ...        ...        ...       ...
1455  Converted       Boston  Marketing 2022-12-27  0.064516
1456  Converted       Boston  Marketing 2022-12-28  0.064516
1457  Converted       Boston  Marketing 2022-12-29  0.064516
1458  Converted       Boston  Marketing 2022-12-30  0.064516
1459  Converted       Boston  Marketing 2022-12-31  0.064516

[1460 rows x 5 columns]

相关问题