SQL Server SQL query to aggregate data into shifts based on time of events?

t3psigkw  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(104)

Updated to specify using MS SQL Server.

I have a table that has one row for each action performed by a worker, each action with a date/time stamp.

Action_Log
| Action_ID | Action_DTTM |
| ------------ | ------------ |
| 1 | 04/19/2023 11:15:37 |
| 2 | 04/19/2023 12:16:14 |
| 3 | 04/19/2023 12:16:14 |
| 4 | 04/19/2023 14:21:03 |
| 5 | 04/19/2023 16:55:41 |
| 6 | 04/19/2023 18:10:43 |
| 7 | 04/19/2023 20:24:15 |
| 8 | 04/19/2023 22:43:29 |
| 9 | 04/20/2023 13:21:04 |
| 10 | 04/20/2023 13:55:57 |
| 11 | 04/20/2023 13:55:57 |
| 12 | 04/20/2023 16:28:11 |
| 13 | 04/20/2023 17:35:45 |
| 14 | 04/20/2023 18:48:01 |
| 15 | 04/20/2023 19:32:26 |
| 16 | 04/20/2023 20:16:39 |
| 17 | 04/20/2023 21:03:25 |

I need a query that will aggregate the actions into shifts. This table above represents actions during 2 shifts by a single worker. There are a few different shift types a worker might work - ShiftA is 11am-11pm, ShiftB is 1-9pm - but I don't have a shift schedule telling me when workers were scheduled for each type of shift. All I have is the action log.

Example of the kind of output I'm hoping for:
| Action_ID | Shift_Number | Shift_Type |
| ------------ | ------------ | ------------ |
| 1 | 1 | ShiftA |
| 2 | 1 | ShiftA |
| 3 | 1 | ShiftA |
| 4 | 1 | ShiftA |
| 5 | 1 | ShiftA |
| 6 | 1 | ShiftA |
| 7 | 1 | ShiftA |
| 8 | 1 | ShiftA |
| 9 | 2 | ShiftB |
| 10 | 2 | ShiftB |
| 11 | 2 | ShiftB |
| 12 | 2 | ShiftB |
| 13 | 2 | ShiftB |
| 14 | 2 | ShiftB |
| 15 | 2 | ShiftB |
| 16 | 2 | ShiftB |
| 17 | 2 | ShiftB |

I'm not even sure if this can be accomplished in SQL? Also not sure if my sample tables are rendering in a legible way. I can add screenshots if that's easier to read

gwbalxhn

gwbalxhn1#

OP, action id 17 goes over shift by 225 seconds. Charlieface's answer makes assumptions about a first shift's action (magic numbers). My approach gets this naturally. Code has all the comments to explain everything. I also added to your sample set of data to get data for all 4 shifts to prove this works. Also, my solution supports overlapping shifts, but makes the assumption there is only one shift working any given second. If that is not the case, then there is absolutely no way of doing this.

My suggestion is to have shift times in a user table (not a variable), similar to my setup, so you can easily update that table instead of updating your query with magic numbers. You never want to have a production query using magic numbers. You can even go further and add two datetime columns - date_begin and date_end describing date ranges a shift is valid. In this setup, you would leave the date_end null until it becomes obsolete, then add new rows as needed to describe the new shift times. The reason you would want to do this is because if shift times change, and you need to query back in time, then you will have to either do multiple queries to support the different shift time change, or, use the updated shift times table to support all of that gracefully.

Unfortunately, the result set is large (many columns to show supporting data, and 34 rows deep), and will not post a picture until asked. I also do not use db fiddle. I believe some on here are tightly related to whoever owns it and drives traffic to it.

declare @shift_times table
(
    shift_number tinyint not null,
    shift_type varchar(10) not null,
    time_start time not null,
    time_end time not null
)

declare @action_log table
(
    action_id int not null,
    action_dttm datetime not null
)

-- it is nice to have the action_dttm as a date as well for comparison
-- we use this table in place of @action_log after populating
declare @action_log_expanded table
(
    action_id int not null,
    action_dttm datetime not null,
    action_date date not null
)

-- this holds the min and max action dttm by day (the x means by, like this by that)
declare @min_max_dttm_x_date table
(
    action_date date not null,
    min_dttm_x_day datetime not null,
    max_dttm_x_day datetime not null
)

-- this table holds data guaranteed to match the shift to the STARTING day worked
-- based on comparing shift start time to the first day's transaction (look at the top 1 order by clause)
-- so if a shift went over midnight, this table will hold its action date as 1 day less than the action log dttm
-- this will not account for certain oddities that we make up for later
-- if you select from this table after it is insert into, you will see what i mean
declare @guaranteed_shift_x_min_action_dttm_x_day table
(
    shift_number tinyint not null,
    shift_type varchar(10) not null,
    time_start time not null,
    time_end time not null,
    action_date date not null,
    min_dttm_x_day datetime not null,
    max_dttm_x_day datetime not null,
    action_date_shift_start datetime not null,
    action_date_shift_end datetime not null,
    action_date_before_midnight datetime not null
)

-- in your sample data, action id 17 went over by 3.x minutes from shift B end
-- this variable allows you some flexibility in allowing overage.
-- in your sample data (which i did not change, but added to it),
-- this record would have gone to shift C, but this allows it to 
-- allocate to shift B. If it does not fit the criteria, 
-- the record is then omitted from the result set because it matches no shift.
-- your business would need to make a decision about how to handle these. 
declare @max_seconds_after_shift int = 300 -- 5 minutes

insert  into @shift_times
select  1, 'ShiftA', '11:00:00', '23:00:00' union all
select  2, 'ShiftB', '13:00:00', '21:00:00' union all
select  3, 'ShiftC', '22:00:00', '06:00:00' union all
select  4, 'ShiftD', '18:00:00', '06:00:00'

insert  into @action_log
select 1, '04/19/2023 11:15:37' union all
select 2, '04/19/2023 12:16:14' union all
select 3, '04/19/2023 12:16:14' union all
select 4, '04/19/2023 14:21:03' union all
select 5, '04/19/2023 16:55:41' union all
select 6, '04/19/2023 18:10:43' union all
select 7, '04/19/2023 20:24:15' union all
select 8, '04/19/2023 22:43:29' union all
select 9, '04/20/2023 13:21:04' union all
select 10, '04/20/2023 13:55:57' union all
select 11, '04/20/2023 13:55:57' union all
select 12, '04/20/2023 16:28:11' union all
select 13, '04/20/2023 17:35:45' union all
select 14, '04/20/2023 18:48:01' union all
select 15, '04/20/2023 19:32:26' union all
select 16, '04/20/2023 20:16:39' union all
select 17, '04/20/2023 21:03:25' union all
select 18, '04/21/2023 22:21:04' union all
select 19, '04/21/2023 22:55:57' union all
select 20, '04/21/2023 22:55:57' union all
select 21, '04/22/2023 01:28:11' union all
select 22, '04/22/2023 02:35:45' union all
select 23, '04/22/2023 03:48:01' union all
select 24, '04/22/2023 04:32:26' union all
select 25, '04/22/2023 05:16:39' union all
select 26, '04/22/2023 06:03:25' union all
select 27, '04/23/2023 18:15:37' union all
select 28, '04/23/2023 19:16:14' union all
select 29, '04/23/2023 19:16:14' union all
select 30, '04/23/2023 21:21:03' union all
select 31, '04/23/2023 23:55:41' union all
select 32, '04/24/2023 01:10:43' union all
select 33, '04/24/2023 03:24:15' union all
select 34, '04/24/2023 05:43:29' 

-- expand the action log
insert  into @action_log_expanded
select  ac.action_id,
        ac.action_dttm,
        convert(date, ac.action_dttm) as action_date
from    @action_log ac

-- get the min and max action dttm per day
insert  into @min_max_dttm_x_date
select  top 1 with ties convert(date, a.action_dttm) as action_date,
        min(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as min_dttm_x_day,
        max(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as max_dttm_x_day
from    @action_log a
order by    ROW_NUMBER() over (partition by convert(date, a.action_dttm) order by a.action_dttm) 

-- based on what we know, we can compare shift start time and the first day's transaction
-- to atleast get some guaranteed knowns
insert  into @guaranteed_shift_x_min_action_dttm_x_day
select  top 1 with ties 
        s.shift_number,
        s.shift_type,
        s.time_start,
        s.time_end,
        d.action_date,
        d.min_dttm_x_day,
        d.max_dttm_x_day,
        cast(d.action_date as datetime) + cast(s.time_start as datetime) as action_date_shift_start,
        cast(d.action_date as datetime) + cast(s.time_end as datetime) as action_date_shift_end,
        cast(d.action_date as datetime) + cast('23:59:59' as datetime) as action_date_before_midnight
from    @shift_times s
                inner join
        @min_max_dttm_x_date d on 
                    s.time_start <= convert(time, d.min_dttm_x_day)  -- purposefully do not check the end time because data shows (action id 17) that users can go over shift)
order by    ROW_NUMBER() over (partition by d.action_date order by abs(datediff(second, convert(time, s.time_start), convert(time, d.min_dttm_x_day)))) asc 


-- join the action log
-- for the dates that are not on here, we can simply go into the next day in the join within the action log by adding the action date + 1 and checking for time
select  *
from    @guaranteed_shift_x_min_action_dttm_x_day g
            inner join
        @action_log_expanded e on
                (
                    g.action_date = e.action_date and -- the shift's first transaction date equals the action dttm
                    (
                        g.time_start < g.time_end and -- this shift does not overlap midnight
                        (
                            (   -- the action dttm betwen the shift times - just adding time to a date time
                                e.action_dttm between g.action_date_shift_start and g.action_date_shift_end
                            ) or
                            (
                                -- existence and flexibility of time outside of shift end - see your action log id 17
                                -- this is adjustable
                                datediff(second, g.action_date_shift_end, e.action_dttm) between 0 and @max_seconds_after_shift
                            )
                        )
                    ) or
                    (
                        g.time_start > g.time_end and -- shift spans two days but this join is for start day
                        e.action_dttm between g.action_date_shift_start and g.action_date_before_midnight
                    )                   
                ) or
                (
                    -- the shift's first transaction happened on the previous day
                    cast(dateadd(day, 1, g.action_date) as date) = e.action_date and
                    g.time_start > g.time_end and
                    (
                        (
                            e.action_dttm between cast(e.action_date as datetime) and dateadd(day, 1, g.action_date_shift_end)
                        ) or
                        (
                            datediff(second, dateadd(day, 1, g.action_date_shift_end), e.action_dttm) between 0 and @max_seconds_after_shift

                        )

                    )
                )
order by    e.action_id asc
xggvc2p6

xggvc2p62#

This is a classic Gaps-and-Islands problem.

You need to first work out the beginning of each shift (by looking for a gap of more than 8 hours). Then you need to apply that calculation across the whole shift, and then based on that you can assign a shift.

I note that your shifts are overlapping, so the results might not be perfectly accurate.

One option is to assign a conditional count of shift starts which gives you a number for each shift, then taking the minimum Action_DTTM per shift.

WITH Previous AS (
    SELECT *,
      PrevDate = LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
    FROM Action_Log al
),
Grouped AS (
    SELECT *,
      Shift_Number = COUNT(CASE WHEN PrevDate > DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE 1 END)
                 OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
    FROM Previous
),
PerGroup AS (
    SELECT *,
      FirstAction = DATEPART(hour, MIN(Action_DTTM) OVER (PARTITION BY Shift_Number))
    FROM Grouped
)
SELECT
  Action_ID,
  Shift_Number,
  Shift_Type =
    CASE WHEN FirstAction BETWEEN 11 AND 12
            THEN 'A'
         WHEN FirstAction BETWEEN 13 AND 17
            THEN 'B'
         WHEN FirstAction BETWEEN 18 AND 21
            THEN 'D'
         ELSE 'C'
    END
FROM PerGroup;

A better (more efficient) option might be to just copy over the Action_DTTM only when it is the start of a shift, then use that to create the shift number as well as applying it to all other columns.

WITH Previous AS (
    SELECT *,
      ShiftStart = CASE WHEN LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
                   > DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE Action_DTTM END
    FROM Action_Log al
),
Grouped AS (
    SELECT *,
      AllShiftStart =  MAX(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING),
      Shift_Number = COUNT(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
    FROM Previous
),
PerGroup AS (
    SELECT *,
      FirstAction = DATEPART(hour, AllShiftStart)
    FROM Grouped
)
SELECT
  Action_ID,
  Shift_Number,
  Shift_Type =
    CASE WHEN FirstAction BETWEEN 11 AND 12
            THEN 'A'
         WHEN FirstAction BETWEEN 13 AND 17
            THEN 'B'
         WHEN FirstAction BETWEEN 18 AND 21
            THEN 'D'
         ELSE 'C'
    END
FROM PerGroup;

db<>fiddle

jckbn6z7

jckbn6z73#

If you truly have data without long breaks then I'd think you would want to find those clusters and then use minimum (and/or maximum) times within each span.

with data as (
    select *,
        case when datediff(hour, Action_DTTM,
                 lead(Action_DTTM) over (partition by Worker_ID order by Action_DTTM)) > 3
             then 1 else 0 end as shift_Break
    from T
), shifts as (
    select *,
        sum(shift_Break) over (partition by Worker_ID order by Action_DTTM) as shift_Counter
    from data
), shift_Agg
    select *,
        datepart(hour, min(Action_DTTM) over (partition by Worker_ID, shift_Counter)) as first_Hour
    from shifts
)
select Worker_ID, Action, DTTM
    case when first_Hour < 15 then -- A or B
        case when first_Hour < 13 then 'A' else 'B' end else
        case when first_Hour < 21 then 'C' else 'D' end
    end as shift_Name
from shift_Agg;

I assume there is potential to start shifts early and late so I tried fudge the ranges a bit. Clearly you'll want to play with that logic or make it slightly more robust by looking at end times or total durations. You also appear to be able to distinguish workers which I assume is a worker ID of some kind. All in all this is nearly the same as Charlie's after reading it over.

相关问题