输入数据:由员工ID、开始时间和结束时间给出的员工计划班次
| 职员_ID|开始时间|结束时间|
| - ------|- ------|- ------|
| 1个|二○二三○三|二○二三○三一五○二四五○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○|
| 第二章|20230315001500000000000年|二○二三○三一五○七○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○|
| 三个|2023年03月15日02月30日00月00日|二○二三○三|
预期输出: Dataframe 或数组,其中包含按日期和15分钟增量安排的人数之和(班次可以跨越一天以上,并应延续)
| 日期|十二|十二点十五分|十二点半|十二点四十五分|无|零点十五分|0点半|0时45分|1个|一点十五分|一点半|一点四十五分|第二章|2点15分|两点半|2点45分|三个|...|二十四|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 二○二三年三月十五日|无|1个|1个|1个|第二章|第二章|第二章|第二章|第二章|第二章|第二章|第二章|第二章|第二章|三个|第二章|第二章|...|1个|
| 二○二三年三月十六日|1个|1个|1个|1个|无|无|无|无|无|无|无|无|无|无|无|无|无|...|无|
到目前为止,我已经找到了如何以每小时为增量显示数据的方法,但当轮班的开始或结束时间不完全在整点时,这种方法就无法正常工作(例如:2:45开始班次记录为2:00开始)
创建小时列并填充numPy的NA
df = df.assign(**{'0': np.nan, '1': np.nan, '2': np.nan, '3': np.nan, '4': np.nan, '5': np.nan,
'6': np.nan, '7': np.nan, '8': np.nan, '9': np.nan, '10': np.nan, '11': np.nan,
'12': np.nan, '13': np.nan, '14': np.nan, '15': np.nan, '16': np.nan, '17': np.nan,
'18': np.nan, '19': np.nan, '20': np.nan, '21': np.nan, '22': np.nan, '23': np.nan})
df['0'] = (df['Start_Time']< 1) & (1 <= df['End_Time'])
df['0'] = df['0'].astype(int)
df['1'] = (df['Start_Time']< 2) & (2 <= df['End_Time'])
df['1'] = df['1'].astype(int)
df['2'] = (df['Start_Time']< 3) & (3 <= df['End_Time'])
df['2'] = df['2'].astype(int)
df['3'] = (df['Start_Time']< 4) & (4 <= df['End_Time'])
df['3'] = df['3'].astype(int)
df['4'] = (df['Start_Time']< 5) & (5 <= df['End_Time'])
df['4'] = df['4'].astype(int)
df['5'] = (df['Start_Time']< 6) & (6 <= df['End_Time'])
df['5'] = df['5'].astype(int)
df['6'] = (df['Start_Time']< 7) & (7 <= df['End_Time'])
df['6'] = df['6'].astype(int)
df['7'] = (df['Start_Time']< 8) & (8 <= df['End_Time'])
df['7'] = df['7'].astype(int)
df['8'] = (df['Start_Time']< 9) & (9 <= df['End_Time'])
df['8'] = df['8'].astype(int)
df['9'] = (df['Start_Time']< 10) & (10 <= df['End_Time'])
df['9'] = df['9'].astype(int)
df['10'] = (df['Start_Time']< 11) & (11 <= df['End_Time'])
df['10'] = df['10'].astype(int)
df['11'] = (df['Start_Time']< 12) & (12 <= df['End_Time'])
df['11'] = df['11'].astype(int)
df['12'] = (df['Start_Time']< 13) & (13 <= df['End_Time'])
df['12'] = df['12'].astype(int)
df['13'] = (df['Start_Time']< 14) & (14 <= df['End_Time'])
df['13'] = df['13'].astype(int)
df['14'] = (df['Start_Time']< 15) & (15 <= df['End_Time'])
df['14'] = df['14'].astype(int)
df['15'] = (df['Start_Time']< 16) & (16 <= df['End_Time'])
df['15'] = df['15'].astype(int)
df['16'] = (df['Start_Time']< 17) & (17 <= df['End_Time'])
df['16'] = df['16'].astype(int)
df['17'] = (df['Start_Time']< 18) & (18 <= df['End_Time'])
df['17'] = df['17'].astype(int)
df['18'] = (df['Start_Time']< 19) & (19 <= df['End_Time'])
df['18'] = df['18'].astype(int)
df['19'] = (df['Start_Time']< 20) & (20 <= df['End_Time'])
df['19'] = df['19'].astype(int)
df['20'] = (df['Start_Time']< 21) & (21 <= df['End_Time'])
df['20'] = df['20'].astype(int)
df['21'] = (df['Start_Time']< 22) & (22 <= df['End_Time'])
df['21'] = df['21'].astype(int)
df['22'] = (df['Start_Time']< 23) & (23 <= df['End_Time'])
df['22'] = df['22'].astype(int)
df['23'] = (df['Start_Time']< 24) & (24 <= df['End_Time'])
df['23'] = df['23'].astype(int)`
将数据分组并求和到特定日期(在分析和格式化开始日期和结束日期之后)
df = df.groupby(['Start_Date','End_Date']).sum().reset_index()
df = df.drop(columns={'End_Date'})
df = df.rename(columns={'Start_Date':'Date'})
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['Date']).sum().reset_index()
2条答案
按热度按时间htrmnn0y1#
您作为示例给出的 Dataframe 没有给予关于格式的清晰概念,因此我将在这里做一些假设。考虑您拥有的 Dataframe 如下所示:
这个函数可以完成这项工作,但不是以最漂亮的方式。我相信有很多方法可以优化它:
输出将是(我对结果使用了转置,所以它们适合在这里):
rsl1atfo2#
IIUC,您可以使用
melt
、groupby
和pivot
的组合。我已经包含了 Dataframe 的中间状态: