pandas 使用员工轮班的开始时间和结束时间,如何创建一个数组来显示以15分钟为增量安排了多少人?

rsaldnfx  于 2023-02-17  发布在  其他
关注(0)|答案(2)|浏览(116)

输入数据:由员工ID、开始时间和结束时间给出的员工计划班次
| 职员_ID|开始时间|结束时间|
| - ------|- ------|- ------|
| 1个|二○二三○三|二○二三○三一五○二四五○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○|
| 第二章|20230315001500000000000年|二○二三○三一五○七○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○|
| 三个|2023年03月15日02月30日00月00日|二○二三○三|
预期输出: Dataframe 或数组,其中包含按日期和15分钟增量安排的人数之和(班次可以跨越一天以上,并应延续)
| 日期|十二|十二点十五分|十二点半|十二点四十五分|无|零点十五分|0点半|0时45分|1个|一点十五分|一点半|一点四十五分|第二章|2点15分|两点半|2点45分|三个|...|二十四|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 二○二三年三月十五日|无|1个|1个|1个|第二章|第二章|第二章|第二章|第二章|第二章|第二章|第二章|第二章|第二章|三个|第二章|第二章|...|1个|
| 二○二三年三月十六日|1个|1个|1个|1个|无|无|无|无|无|无|无|无|无|无|无|无|无|...|无|
到目前为止,我已经找到了如何以每小时为增量显示数据的方法,但当轮班的开始或结束时间不完全在整点时,这种方法就无法正常工作(例如:2:45开始班次记录为2:00开始)

创建小时列并填充numPy的NA

df = df.assign(**{'0': np.nan, '1': np.nan, '2': np.nan, '3': np.nan, '4': np.nan, '5': np.nan, 
           '6': np.nan, '7': np.nan, '8': np.nan, '9': np.nan, '10': np.nan, '11': np.nan,
          '12': np.nan, '13': np.nan, '14': np.nan, '15': np.nan, '16': np.nan, '17': np.nan,
         '18': np.nan, '19': np.nan, '20': np.nan, '21': np.nan, '22': np.nan, '23': np.nan})

df['0'] = (df['Start_Time']< 1) &  (1 <= df['End_Time'])
df['0'] = df['0'].astype(int)

df['1'] = (df['Start_Time']< 2) &  (2 <= df['End_Time'])
df['1'] = df['1'].astype(int)

df['2'] = (df['Start_Time']< 3) &  (3 <= df['End_Time'])
df['2'] = df['2'].astype(int)

df['3'] = (df['Start_Time']< 4) &  (4 <= df['End_Time'])
df['3'] = df['3'].astype(int)

df['4'] = (df['Start_Time']< 5) &  (5 <= df['End_Time'])
df['4'] = df['4'].astype(int)

df['5'] = (df['Start_Time']< 6) &  (6 <= df['End_Time'])
df['5'] = df['5'].astype(int)

df['6'] = (df['Start_Time']< 7) &  (7 <= df['End_Time'])
df['6'] = df['6'].astype(int)

df['7'] = (df['Start_Time']< 8) &  (8 <= df['End_Time'])
df['7'] = df['7'].astype(int)

df['8'] = (df['Start_Time']< 9) &  (9 <= df['End_Time'])
df['8'] = df['8'].astype(int)

df['9'] = (df['Start_Time']< 10) &  (10 <= df['End_Time'])
df['9'] = df['9'].astype(int)

df['10'] = (df['Start_Time']< 11) &  (11 <= df['End_Time'])
df['10'] = df['10'].astype(int)

df['11'] = (df['Start_Time']< 12) &  (12 <= df['End_Time'])
df['11'] = df['11'].astype(int)

df['12'] = (df['Start_Time']< 13) &  (13 <= df['End_Time'])
df['12'] = df['12'].astype(int)

df['13'] = (df['Start_Time']< 14) &  (14 <= df['End_Time'])
df['13'] = df['13'].astype(int)

df['14'] = (df['Start_Time']< 15) &  (15 <= df['End_Time'])
df['14'] = df['14'].astype(int)

df['15'] = (df['Start_Time']< 16) &  (16 <= df['End_Time'])
df['15'] = df['15'].astype(int)

df['16'] = (df['Start_Time']< 17) &  (17 <= df['End_Time'])
df['16'] = df['16'].astype(int)

df['17'] = (df['Start_Time']< 18) &  (18 <= df['End_Time'])
df['17'] = df['17'].astype(int)

df['18'] = (df['Start_Time']< 19) &  (19 <= df['End_Time'])
df['18'] = df['18'].astype(int)

df['19'] = (df['Start_Time']< 20) &  (20 <= df['End_Time'])
df['19'] = df['19'].astype(int)

df['20'] = (df['Start_Time']< 21) &  (21 <= df['End_Time'])
df['20'] = df['20'].astype(int)

df['21'] = (df['Start_Time']< 22) &  (22 <= df['End_Time'])
df['21'] = df['21'].astype(int)

df['22'] = (df['Start_Time']< 23) &  (23 <= df['End_Time'])
df['22'] = df['22'].astype(int)

df['23'] = (df['Start_Time']< 24) &  (24 <= df['End_Time'])
df['23'] = df['23'].astype(int)`

将数据分组并求和到特定日期(在分析和格式化开始日期和结束日期之后)

df = df.groupby(['Start_Date','End_Date']).sum().reset_index()
df = df.drop(columns={'End_Date'})
df = df.rename(columns={'Start_Date':'Date'})
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['Date']).sum().reset_index()
htrmnn0y

htrmnn0y1#

您作为示例给出的 Dataframe 没有给予关于格式的清晰概念,因此我将在这里做一些假设。考虑您拥有的 Dataframe 如下所示:

import pandas as pd

df = pd.DataFrame({
    'Employee id': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
    'Shift start': ['2023-02-14 08:00:00', '2023-02-14 09:30:00', '2023-02-14 12:00:00', 
                    '2023-02-14 13:00:00', '2023-02-15 16:00:00', '2023-02-15 18:00:00',
                    '2023-02-17 10:00:00', '2023-02-17 14:00:00'],
    'Shift end': ['2023-02-14 16:00:00', '2023-02-14 17:30:00', '2023-02-14 18:00:00',
                  '2023-02-14 19:00:00', '2023-02-15 20:00:00', '2023-02-15 22:00:00',
                  '2023-02-17 18:00:00', '2023-02-17 19:00:00']
})

这个函数可以完成这项工作,但不是以最漂亮的方式。我相信有很多方法可以优化它:

def count_employees_per_15_min_interval(df):
    # convert string columns to datetime format
    df['Shift start'] = pd.to_datetime(df['Shift start'])
    df['Shift end'] = pd.to_datetime(df['Shift end'])

    # find minimum and maximum shift start and end times
    min_start_time = df['Shift start'].min().replace(hour=0, minute=0, second=0, microsecond=0)
    max_end_time = df['Shift end'].max().replace(hour=23, minute=59, second=59, microsecond=0)

    # create a list of dates to iterate over
    dates = pd.date_range(start=min_start_time, end=max_end_time, freq='D').date.tolist()

    # initialize an empty dictionary to store the employee counts
    employee_counts = {date: [0] * 96 for date in dates}

    # iterate over each employee's shift and increment the count for each 15-minute interval
    for _, row in df.iterrows():
        start_time = row['Shift start']
        end_time = row['Shift end']
        date = start_time.date()

        # calculate the number of 15-minute intervals between the start and end times
        num_intervals = int((end_time - start_time) / timedelta(minutes=15))

        # add the employee to the count for each 15-minute interval
        for i in range(num_intervals + 1):
            start_index = int(start_time.hour * 4 + start_time.minute / 15)
            employee_counts[date][start_index + i] += 1

    # create a DataFrame from the employee counts dictionary
    result = pd.DataFrame.from_dict(employee_counts, orient='index',
                                    columns=[f'{hour:02d}:{minute:02d}' for hour in range(24) for minute in range(0, 60, 15)])

    return result

输出将是(我对结果使用了转置,所以它们适合在这里):

2023-02-14  2023-02-15  2023-02-16  2023-02-17
00:00   0   0   0   0
00:15   0   0   0   0
00:30   0   0   0   0
00:45   0   0   0   0
01:00   0   0   0   0
01:15   0   0   0   0
01:30   0   0   0   0
01:45   0   0   0   0
02:00   0   0   0   0
02:15   0   0   0   0
02:30   0   0   0   0
02:45   0   0   0   0
03:00   0   0   0   0
03:15   0   0   0   0
03:30   0   0   0   0
03:45   0   0   0   0
04:00   0   0   0   0
04:15   0   0   0   0
04:30   0   0   0   0
04:45   0   0   0   0
05:00   0   0   0   0
05:15   0   0   0   0
05:30   0   0   0   0
05:45   0   0   0   0
06:00   0   0   0   0
06:15   0   0   0   0
06:30   0   0   0   0
06:45   0   0   0   0
07:00   0   0   0   0
07:15   0   0   0   0
07:30   0   0   0   0
07:45   0   0   0   0
08:00   1   0   0   0
08:15   1   0   0   0
08:30   1   0   0   0
08:45   1   0   0   0
09:00   1   0   0   0
09:15   1   0   0   0
09:30   2   0   0   0
09:45   2   0   0   0
10:00   2   0   0   1
10:15   2   0   0   1
10:30   2   0   0   1
10:45   2   0   0   1
11:00   2   0   0   1
11:15   2   0   0   1
11:30   2   0   0   1
11:45   2   0   0   1
12:00   3   0   0   1
12:15   3   0   0   1
12:30   3   0   0   1
12:45   3   0   0   1
13:00   4   0   0   1
13:15   4   0   0   1
13:30   4   0   0   1
13:45   4   0   0   1
14:00   4   0   0   2
14:15   4   0   0   2
14:30   4   0   0   2
14:45   4   0   0   2
15:00   4   0   0   2
15:15   4   0   0   2
15:30   4   0   0   2
15:45   4   0   0   2
16:00   4   1   0   2
16:15   3   1   0   2
16:30   3   1   0   2
16:45   3   1   0   2
17:00   3   1   0   2
17:15   3   1   0   2
17:30   3   1   0   2
17:45   2   1   0   2
18:00   2   2   0   2
18:15   1   2   0   1
18:30   1   2   0   1
18:45   1   2   0   1
19:00   1   2   0   1
19:15   0   2   0   0
19:30   0   2   0   0
19:45   0   2   0   0
20:00   0   2   0   0
20:15   0   1   0   0
20:30   0   1   0   0
20:45   0   1   0   0
21:00   0   1   0   0
21:15   0   1   0   0
21:30   0   1   0   0
21:45   0   1   0   0
22:00   0   1   0   0
22:15   0   0   0   0
22:30   0   0   0   0
22:45   0   0   0   0
23:00   0   0   0   0
23:15   0   0   0   0
23:30   0   0   0   0
23:45   0   0   0   0
rsl1atfo

rsl1atfo2#

IIUC,您可以使用meltgroupbypivot的组合。我已经包含了 Dataframe 的中间状态:

df=pd.DataFrame({'Employee_ID': {0: 1, 1: 2, 2: 3}, 'Start_Time': {0: '202303150100000000000000', 1: '202303150015000000000000', 2: '202303150230000000000000'}, 'End_Time': {0: '202303150245000000000000', 1: '202303150700000000000000', 2: '202303160100000000000000'}})

# shorten your datetime values and turning it into real date time values.
df['Start_Time'] = df['Start_Time'].astype(float).divide(10000000000).astype(int)
df['End_Time'] = df['End_Time'].astype(float).divide(10000000000).astype(int)
df['Start_Time'] = pd.to_datetime(df['Start_Time'], format='%Y%m%d%H%M%S')
df['End_Time'] = pd.to_datetime(df['End_Time'], format='%Y%m%d%H%M%S')

# create a melt, basically stacking the column Start_Time and End_Time
df = pd.melt(df, id_vars=['Employee_ID'], value_vars=['Start_Time', 'End_Time']).drop(columns='variable').set_index('value')
print(df)
#                     Employee_ID
# value                           
# 2023-03-15 01:00:00            1
# 2023-03-15 00:15:00            2
# 2023-03-15 02:30:00            3
# 2023-03-15 02:45:00            1
# 2023-03-15 07:00:00            2
# 2023-03-16 01:00:00            3

# Group this dataframe by Employee_ID, while resampling the datetimes to 15 minute intervals.
df = df.groupby('Employee_ID').resample('15T').ffill().reset_index(drop=True, level=0).reset_index()
print(df)
#                   value  Employee_ID
# 0   2023-03-15 01:00:00            1
# 1   2023-03-15 01:15:00            1
# 2   2023-03-15 01:30:00            1
# 3   2023-03-15 01:45:00            1
# 4   2023-03-15 02:00:00            1
# ..                  ...          ...
# 122 2023-03-16 00:00:00            3
# 123 2023-03-16 00:15:00            3
# 124 2023-03-16 00:30:00            3
# 125 2023-03-16 00:45:00            3
# 126 2023-03-16 01:00:00            3

# Group the df by 'value' (which has all the dates) and extract date and time columns
df = df.groupby('value').count().rename(columns={'Employee_ID': 'Employee_Count'})
df['Date'] = df.index.date
df['Time'] = df.index.time
print(df)
#                      Employee_Count        Date      Time
# value                                                    
# 2023-03-15 00:15:00               1  2023-03-15  00:15:00
# ...
# 2023-03-16 00:45:00               1  2023-03-16  00:45:00
# 2023-03-16 01:00:00               1  2023-03-16  01:00:00

# Finally, pivot the dataframe, taking the Time column as columns.
df = df.pivot(columns='Time', index='Date', values='Employee_Count').fillna(0)
df

相关问题