Edge detection in SQL Server query

3pmvbmvn  于 2023-02-18  发布在  SQL Server
关注(0)|答案(1)|浏览(139)

I've got a table where the value in each row can be one of two values, eg 0 or 1. Each row also has an ID identity column.

What's the most efficient way to query the table to return the min and max ID values for each contiguous group of values?

Example:

declare @tbl table (ID INT IDENTITY(1,1), Value INT);

insert into @tbl (Value)
values (1), (1), (1), (0), (0), (1), (0), (0), (1), (1), (1), (1);

Table contents:

ID  Value
1   1
2   1
3   1
4   0
5   0
6   1
7   0
8   0
9   1
10  1
11  1
12  1

What I'd like to output from the query:

GroupID Value MinID MaxID
1       1     1     3
2       0     4     5
3       1     6     6
4       0     7     8
5       1     9     12

I've worked out a query to return these results. However, it's inefficient as it has to run over the table 4x, twice to get the MinID values and twice again for the MaxID values. As the real table I'll be running this query over has over 10 million rows, I'm looking for a more efficient query.

Here's the query I've worked out:

declare @maxID INT;
select @maxID = MAX(ID)
from @tbl;

select min.RowID AS GroupID, min.Value, min.MinID, max.MaxID
from 
    (
        select 
            ROW_NUMBER() OVER (ORDER BY x.ID) AS RowID, 
            x.Value, x.ID as MinID
        from @tbl x
            left join @tbl ld on x.ID = ld.ID + 1
                            and x.Value <> ld.Value
        where x.ID = 1
            or ld.ID is not null
    ) min
    join 
    (
        select 
            ROW_NUMBER() OVER (ORDER BY x.ID) AS RowID, 
            x.Value, x.ID as MaxID
        from @tbl x
            left join @tbl trl on x.ID = trl.ID - 1
                            and x.Value <> trl.Value
        where trl.ID is not null
            or x.ID = @maxID
    ) max
    on min.RowID = max.RowID
order by GroupID;

Is there a better way to achieve the same results?

vsikbqxv

vsikbqxv1#

Try the following which uses lag to first identify where each group of values changes, then a running sum to create a sequential value for each group; with that in place you can then simply aggregate:

declare @tbl table (ID int identity(1,1), Value int);
insert into @tbl (Value)
values (1), (1), (1), (0), (0), (1), (0), (0), (1), (1), (1), (1);

with grp as (
  select *,
      case when Lag(Value) over(order by ID) = Value then 0 else 1 end gp
  from @tbl
), seq as (
  select *, Sum(gp) over(order by ID) s
  from grp
)
select
  s + 1 as GroupId,
  Value,
  Min(ID) MinId,
  Max(ID) MaxId
from seq
group by s + 1, Value
order by GroupId;

fiddle Demo

相关问题