很长一段时间以来,我一直在维护一个报告,显示不同市场和渠道的目标进度,我一直依赖于GoogleSheets中的一些功能(最常见的是拆分和扁平化),获取月度预算并将其拆分为每日目标,以便与来自另一个系统的数据相结合以获得每日计数,然后在Tableau Desktop中将其聚合到所需的任何时间段(例如,按周、月、年)。添加新市场、渠道等非常挑剔,而且Google Sheets已经变得太大,无法与Tableau连接。我想使用Python来简化这一过程。
我一直在研究的解决方案使用Python的Pandas库来拉入一个Excel文件,其中每行包含市场、渠道和KPI,每个月有一列,实际目标作为值。我可以使用pd.melt将其解透视为一个更表格化的视图,但我还没有找到任何解决方案允许我将每个月扩展为天。其中,每一天都有与该月天数成比例的目标的一部分,同时保留KPI、市场和渠道。
df = pd.DataFrame([['New', 'Albuquerque', 'Marketing', 34, 34, 34, 35, 35, 36, 36, 36, 37, 40, 40, 40],
['New', 'Boston', 'Marketing', 12, 12, 12, 12, 12, 13, 13, 14, 14, 15, 16, 17],
['Converted', 'Albuquerque', 'Marketing', 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
['Converted', 'Boston', 'Marketing', 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],
columns=['KPI',
'Market',
'Channel',
'2022-01-01',
'2022-02-01',
'2022-03-01',
'2022-04-01',
'2022-05-01',
'2022-06-01',
'2022-07-01',
'2022-08-01',
'2022-09-01',
'2022-10-01',
'2022-11-01',
'2022-12-01'])
# Set up variables for the melt
index_vars = ['KPI','Market','Channel']
val_vars = df.set_index(index_vars).columns.tolist()
# Unpivot months
df = pd.melt(df,
id_vars=index_vars,
value_vars=val_vars,
var_name='Date',
value_name='Goal',
ignore_index=False)
# Force dates to datetime, sort and reset index for a clean view
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df = df.sort_values(by=['KPI','Market','Channel','Date']).reset_index(drop=True)
print(df)
这给了我这样的一个视图:
KPI Market Channel Date Goal
0 Converted Albuquerque Marketing 2022-01-01 5
1 Converted Albuquerque Marketing 2022-02-01 5
2 Converted Albuquerque Marketing 2022-03-01 5
3 Converted Albuquerque Marketing 2022-04-01 5
4 Converted Albuquerque Marketing 2022-05-01 5
5 Converted Albuquerque Marketing 2022-06-01 5
6 Converted Albuquerque Marketing 2022-07-01 5
7 Converted Albuquerque Marketing 2022-08-01 5
8 Converted Albuquerque Marketing 2022-09-01 5
9 Converted Albuquerque Marketing 2022-10-01 5
10 Converted Albuquerque Marketing 2022-11-01 5
11 Converted Albuquerque Marketing 2022-12-01 5
12 Converted Boston Marketing 2022-01-01 2
13 Converted Boston Marketing 2022-02-01 2
...
我试着让它吐出这样的东西:
KPI Market Channel Date Goal
0 Converted Albuquerque Marketing 2022-01-01 0.1612903226
1 Converted Albuquerque Marketing 2022-01-02 0.1612903226
2 Converted Albuquerque Marketing 2022-01-03 0.1612903226
3 Converted Albuquerque Marketing 2022-01-04 0.1612903226
4 Converted Albuquerque Marketing 2022-01-05 0.1612903226
5 Converted Albuquerque Marketing 2022-01-06 0.1612903226
6 Converted Albuquerque Marketing 2022-01-07 0.1612903226
7 Converted Albuquerque Marketing 2022-01-08 0.1612903226
8 Converted Albuquerque Marketing 2022-01-09 0.1612903226
9 Converted Albuquerque Marketing 2022-01-10 0.1612903226
10 Converted Albuquerque Marketing 2022-01-11 0.1612903226
11 Converted Albuquerque Marketing 2022-01-12 0.1612903226
12 Converted Boston Marketing 2022-01-13 0.064516129
13 Converted Boston Marketing 2022-01-14 0.064516129
...
1条答案
按热度按时间rm5edbpk1#
从
melt
之前的原始 Dataframe 开始可能更容易:中间输出为:
但是,如果需要预期的输出,可以用途:
最终输出: