sql挑战/难题:给定堆栈跟踪-如何在每个时间点找到顶部元素?

4ioopgfo  于 2021-06-28  发布在  Hive
关注(0)|答案(5)|浏览(305)

我的实际用例是合并嵌套范围。我画了一些草图,然后我看到了从堆栈push和pop操作开始到结束的范围之间的相似性。我明白解决这个问题也会解决原来的问题。
op栏实际上可以从问题中删除。当val为null时,则为pop操作,否则为push操作。

拼图

表stack\u trace包含以下列:
i—表示时间点的整数值。
op-2可能的操作:i(“输入”/“按下”)和o(“输出”/“弹出”)。
val—由“in”/“push”操作插入的值,或为空表示“out”/“pop”操作。
目标是在每个时间点(i)找到堆栈顶部的值。
例如
(空值在这里表示为空格)
数据:

i   op  val 
--  --  --  
1   I   A   
2   I   B   
3   O
4   I   C
5   O    
6   O

要求结果:

i   top_of_stack_val
--  ----------------
1   A
2   B
3   A
4   C
5   A
6

要求

解决方案应该是单个sql查询(子查询也可以)。
只允许下列子句:select、from、where、group by、having、order by。
不允许使用with子句(cte-common table expression)。
不允许使用t-sql、pl/sql等。
不允许使用自定义函数。
不允许使用变量。

样本数据

create table stack_trace
(
    i       int
   ,op      char(1)
   ,val     char(1)
)
;

insert into stack_trace (i,op,val) values (1,'I','A');
insert into stack_trace (i,op,val) values (2,'I','B');
insert into stack_trace (i,op,val) values (3,'I','C');
insert into stack_trace (i,op,val) values (4,'I','D');
insert into stack_trace (i,op,val) values (5,'I','E');
insert into stack_trace (i,op)     values (6,'O');
insert into stack_trace (i,op)     values (7,'O');
insert into stack_trace (i,op)     values (8,'O');
insert into stack_trace (i,op,val) values (9,'I','F');
insert into stack_trace (i,op)     values (10,'O');
insert into stack_trace (i,op,val) values (11,'I','G');
insert into stack_trace (i,op,val) values (12,'I','H');
insert into stack_trace (i,op)     values (13,'O');
insert into stack_trace (i,op)     values (14,'O');
insert into stack_trace (i,op,val) values (15,'I','I');
insert into stack_trace (i,op,val) values (16,'I','J');
insert into stack_trace (i,op,val) values (17,'I','K');
insert into stack_trace (i,op,val) values (18,'I','L');
insert into stack_trace (i,op,val) values (19,'I','M');
insert into stack_trace (i,op)     values (20,'O');
insert into stack_trace (i,op,val) values (21,'I','N');
insert into stack_trace (i,op)     values (22,'O');
insert into stack_trace (i,op,val) values (23,'I','O');
insert into stack_trace (i,op)     values (24,'O');
insert into stack_trace (i,op,val) values (25,'I','P');
insert into stack_trace (i,op)     values (26,'O');
insert into stack_trace (i,op)     values (27,'O');
insert into stack_trace (i,op,val) values (28,'I','Q');
insert into stack_trace (i,op,val) values (29,'I','R');
insert into stack_trace (i,op)     values (30,'O');
insert into stack_trace (i,op)     values (31,'O');
insert into stack_trace (i,op)     values (32,'O');
insert into stack_trace (i,op)     values (33,'O');
insert into stack_trace (i,op)     values (34,'O');
insert into stack_trace (i,op)     values (35,'O');
insert into stack_trace (i,op,val) values (36,'I','S');
insert into stack_trace (i,op)     values (37,'O');
insert into stack_trace (i,op)     values (38,'O');
insert into stack_trace (i,op,val) values (39,'I','T');
insert into stack_trace (i,op,val) values (40,'I','U');
insert into stack_trace (i,op)     values (41,'O');
insert into stack_trace (i,op,val) values (42,'I','V');
insert into stack_trace (i,op,val) values (43,'I','W');
insert into stack_trace (i,op,val) values (44,'I','X');
insert into stack_trace (i,op)     values (45,'O');
insert into stack_trace (i,op)     values (46,'O');
insert into stack_trace (i,op,val) values (47,'I','Y');
insert into stack_trace (i,op)     values (48,'O');
insert into stack_trace (i,op)     values (49,'O');
insert into stack_trace (i,op,val) values (50,'I','Z');
insert into stack_trace (i,op)     values (51,'O');
insert into stack_trace (i,op)     values (52,'O');

要求的结果

i   top_of_stack_val
--  ----------------
1   A
2   B
3   C
4   D
5   E
6   D
7   C
8   B
9   F
10  B
11  G
12  H
13  G
14  B
15  I
16  J
17  K
18  L
19  M
20  L
21  N
22  L
23  O
24  L
25  P
26  L
27  K
28  Q
29  R
30  Q
31  K
32  J
33  I
34  B
35  A
36  S
37  A
38  
39  T
40  U
41  T
42  V
43  W
44  X
45  W
46  V
47  Y
48  V
49  T
50  Z
51  T
52
46scxncf

46scxncf1#

就我个人而言,我怀疑您最终是否会找到可以在所有sqlserver、teradata、postgres和oracle中使用的sql,并且如果表非常大,那么它的性能是可以接受的。
SQLServer解决方案(演示)如下

SELECT i,
       SUBSTRING(MAX(FORMAT(i, 'D10') + val) OVER (PARTITION BY Pos ORDER BY i 
                                                     ROWS UNBOUNDED PRECEDING), 11, 8000)
FROM   (SELECT st.*,
               sum(CASE WHEN op = 'I' THEN 1 ELSE -1 END) 
                  OVER (ORDER BY i ROWS UNBOUNDED PRECEDING) AS pos
        FROM   stack_trace st) t1
ORDER  BY i;
zvms9eto

zvms9eto2#

这实际上是一个有趣的问题。我要做的是跟踪堆栈中每个元素的位置。您可以使用累积和来执行此操作:

select st.*,
       sum(case when op = 'I' then 1 else -1 end) over (order by i) as pos
from stack_trace st;

唉,在这一点上,我认为您需要一个相当复杂的联接或子查询来计算 pos 指。这里有一种方法:

with st as (
      select st.*,
             sum(case when op = 'I' then 1 else -1 end) over (order by i) as pos
      from stack_trace st
     )
select st.*,
       (select val
        from st st2
        where st2.i <= st.id and st2.pos = st.pos and
              st2.val is not null
        order by i desc
        fetch first 1 row only
       ) as top_of_stack_val
from st;
xtupzzrd

xtupzzrd3#

teradata公司

select      s.i

           ,first_value (s.val) over 
            (
                partition by    s.depth
                order by        s.i
                reset when      s.op = 'I'
            )                                as top_of_stack_val

from       (select      s.i
                       ,s.val
                       ,s.op

                       ,sum (case s.op when 'I' then 1 else -1 end)   over
                        (
                            order by        s.i
                            rows            unbounded preceding
                        )                                                   as depth

            from        stack_trace s
            )
            s

order by    i
;
yrwegjxp

yrwegjxp4#

尽管这需要一个额外的步骤-

针对hive、teradata、oracle、sql server和postgresql的通用解决方案

select      s.i

           ,min (s.val) over
            (
                partition by    s.depth
                               ,s.depth_val_seq
            )                                   as top_of_stack_val            

from       (select      s.i
                       ,s.val
                       ,s.depth

                       ,count (s.val) over 
                        (
                            partition by    s.depth
                            order by        s.i
                            rows            between unbounded preceding and current row
                        )                                                                   as depth_val_seq

            from       (select      s.i
                                   ,s.val

                                   ,sum (case s.op when 'I' then 1 else -1 end)   over
                                    (
                                        order by        s.i
                                        rows            between unbounded preceding and current row
                                    )                                                                   as depth

                        from        stack_trace s
                        )
                        s
            )
            s

order by    i
;
s4n0splo

s4n0splo5#

这是一个很好的拼图。
由于我的主要dbms是teradata,因此我使用分析函数为其编写了一个解决方案(需要td14.10+):

SELECT dt.*,
   -- find the last item in the stack with the same position
   Last_Value(val IGNORE NULLS)
   Over (PARTITION BY pos
         ORDER BY i) AS top_of_stack_val
FROM 
 ( 
   SELECT st.*,
      -- calculate the number of items in the stack
      Sum(CASE WHEN op = 'I' THEN 1 ELSE -1 end) 
      Over (ORDER BY i
            ROWS Unbounded Preceding) AS pos
   FROM stack_trace AS st
 ) AS dt;

这个解决方案也适用于oracle,但是postgresql&sqlserver不支持 IGNORE NULLS 的选项 LAST_VALUE 而模拟它是相当复杂的,例如看伊茨克本根的最后一个非零的难题
编辑:其实没那么复杂,我忘了伊兹克的第二个解决方案,旧的背驮技巧;-)
martin smith的方法适用于所有四种DBMS。

相关问题