numpy 如何从开始日期、结束日期和分类变量开始按月份分组

8dtrkrch  于 2023-04-21  发布在  其他
关注(0)|答案(1)|浏览(127)

有一个HR列表:

Department       Start         End  Salary per month
0      Sales  01.01.2020  30.04.2020              1000
1     People  01.05.2020  30.07.2022              3000
2  Marketing  01.02.2020  30.12.2099              3200
3      Sales  01.03.2020  01.08.2023              1200
4   Engineer  01.04.2020  30.12.2099              3500

使用Python我想知道如何按月(从第一个员工开始)和部门分组工资总额
我曾尝试按日期范围对工资进行分组,但不知道如何与部门分组并在每列中显示每个部门

month_starts = pd.date_range(data.start.min(),data.end.max(),freq = 'MS').to_numpy()
contained = np.logical_and(
    np.greater_equal.outer(month_starts, data['start'].to_numpy()),
    np.less.outer(month_starts,data['end'].to_numpy())
)
masked = np.where(contained, np.broadcast_to(data[['salary_per_month']].transpose(), contained.shape),np.nan)
df = pd.DataFrame(masked, index = month_starts).agg('sum',axis=1).to_frame().reset_index()
df.columns = ['month', 'total_cost']

预期输出:
enter image description here

iq3niunx

iq3niunx1#

您可以repeat您的行并递增句点,然后pivot_table

s = pd.to_datetime(df['Start'], dayfirst=True).dt.to_period('M')
e = pd.to_datetime(df['End'], dayfirst=True).dt.to_period('M')

n = (e-s).apply(lambda x: x.n)

(df.assign(Date=s)
   .loc[df.index.repeat(n+1)]
   .assign(Date=lambda d: d['Date']+d.groupby(level=0).cumcount())
   .pivot_table(index='Date', columns='Department', values='Salary per month',
                aggfunc='sum', fill_value=0)
)

输出:

Department  Engineer  Marketing  People  Sales
Date                                          
2020-01            0          0       0   1000
2020-02            0       3200       0   1000
2020-03            0       3200       0   2200
2020-04         3500       3200       0   2200
2020-05         3500       3200    3000   1200
...              ...        ...     ...    ...
2099-08         3500       3200       0      0
2099-09         3500       3200       0      0
2099-10         3500       3200       0      0
2099-11         3500       3200       0      0
2099-12         3500       3200       0      0

[960 rows x 4 columns]

相关问题