db2 SQL:如何计算每个时间段和名称重置的累计和?

carvr3hs  于 2023-08-05  发布在  DB2
关注(0)|答案(1)|浏览(142)

假设我有下表。

CREATE TABLE transaction (
  "ID" INTEGER PRIMARY KEY,
  "NAME" VARCHAR(4),
  "TIMESTAMP" INTEGER,
  "QUANTITY" INTEGER
);

INSERT INTO transaction
  ("ID", "NAME", "TIMESTAMP", "QUANTITY")
VALUES
  ('1', 'dani', '1686311907', '1'),
  ('2', 'dani', '1686312071', '4'),
  ('3', 'dani', '1686748928', '2'),
  ('4', 'pet', '1687937005', '2'),
  ('5', 'pet', '1688109281', '6');

字符串
其对应于:

| ID  | NAME | TIMESTAMP  | QUANTITY |
| --- | ---- | ---------- | -------- |
| 1   | dani | 1686311907 | 1        |
| 2   | dani | 1686312071 | 4        |
| 3   | dani | 1686748928 | 2        |
| 4   | pet  | 1687937005 | 2        |
| 5   | pet  | 1688109281 | 6        |


此表描述了一系列事务处理。一般来说,我想要的是,对于每个名称和每个时间戳,计算每个工作槽内数量的累积和。每个工作时隙将包括在小于10800秒(相当于3小时)的时间间隔内的那些时间戳。换句话说,必须在每个工作槽上为每个名称重置累积和。
我希望得到的是一个像下面这样的表:

| ID  | NAME | TIMESTAMP  | QUANTITY | CUM_QUANTITY |
| --- | ---- | ---------- | -------- | ------------
| 1   | dani | 1686311907 | 1        | 1            |
| 2   | dani | 1686312071 | 4        | 5            |
| 3   | dani | 1686748928 | 2        | 2            | # new working slot
| 4   | pet  | 1687937005 | 2        | 2            | # new name
| 5   | pet  | 1688109281 | 6        | 6            | # new working slot


到目前为止,我尝试添加一个过渡列来检查记录是否属于新的一天:

select 
  *, 
  CASE WHEN coalesce(
    "TIMESTAMP" - lag("TIMESTAMP", 1) over (
      partition by "NAME" 
      order by 
        "TIMESTAMP" asc
    ), 
    0
  ) < 3600 * 3 THEN 0 ELSE 1 END AS NEW_DAY 
from 
  transaction 
ORDER BY 
  "NAME", 
  "TIMESTAMP"


这给予了我们下表:

| ID  | NAME | TIMESTAMP  | QUANTITY | new_day |
| --- | ---- | ---------- | -------- | ------- |
| 1   | dani | 1686311907 | 1        | 0       |
| 2   | dani | 1686312071 | 4        | 0       |
| 3   | dani | 1686748928 | 2        | 1       |
| 4   | pet  | 1687937005 | 2        | 0       |
| 5   | pet  | 1688109281 | 6        | 1       |


然后通过使用NAME和new_day并按SUM(QUANTITY)聚合来执行某种窗口,但这种方法不起作用,因为new_day只接受值0和1,它应该从0,1,...到每个用户的最大工作时隙。
有谁知道该怎么做吗?

2eafrhcq

2eafrhcq1#

请参阅OLAP specification主题中有关RANGE OLAP函数用法的内容。

WITH transaction ("ID", "NAME", "TIMESTAMP", "QUANTITY") AS 
(
VALUES
  (1, 'dani', 1686311907, 1),
  (2, 'dani', 1686312071, 4),
  (3, 'dani', 1686748928, 2),
  (4, 'pet',  1687937005, 2),
  (5, 'pet',  1688109281, 6)
)
SELECT 
  * 
, sum ("QUANTITY") over (partition by "NAME" order by "TIMESTAMP" range between 10800 preceding and current row)
  as CUM_QUANTITY
FROM transaction

字符串
| 姓名|时间戳|数量|CUM_数量| CUM_QUANTITY |
| --|--|--|--| ------------ |
| 达尼|1686311907|一个|一个| 1 |
| 达尼|电话:1686312071|四个|五个| 5 |
| 达尼|1686748928|二个|二个| 2 |
| 宠物|电话:1687937005|二个|二个| 2 |
| 宠物|1688109281|六个|六个| 6 |
fiddle

更新:

WORKING_SLOT人工柱。
每个名称内部的计算基于:

  • 计算每行的“标志”值-当当前和最后时间戳之间的差小于3小时时为0,否则为1
  • 这些标志的累积和

结果在此数据集上相同,但在其他数据集上可能不同。

WITH 
transaction ("ID", "NAME", "TIMESTAMP", "QUANTITY") AS 
(
VALUES
  (1, 'dani', 1686311907, 1),
  (2, 'dani', 1686312071, 4),
  (3, 'dani', 1686748928, 2),
  (4, 'pet',  1687937005, 2),
  (5, 'pet',  1688109281, 6)
)
, transaction_with_working_slot AS 
(
SELECT 
  * 
, SUM 
(
  CASE 
    WHEN "TIMESTAMP" - COALESCE (LAG ("TIMESTAMP") OVER (PARTITION BY "NAME" ORDER BY "TIMESTAMP"), 0) < 10800
    THEN 0
    ELSE 1
  END
) OVER (PARTITION BY "NAME" ORDER BY "TIMESTAMP")
  AS WORKING_SLOT
FROM transaction
)
SELECT 
  *
, SUM ("QUANTITY") OVER (PARTITION BY "NAME", "WORKING_SLOT" ORDER BY "TIMESTAMP")
  AS CUM_QUANTITY
FROM transaction_with_working_slot
ORDER BY "NAME", "TIMESTAMP"


| 姓名|时间戳|数量|工作_插槽|CUM_数量| CUM_QUANTITY |
| --|--|--|--|--| ------------ |
| 达尼|1686311907|一个|一个|一个| 1 |
| 达尼|电话:1686312071|四个|一个|五个| 5 |
| 达尼|1686748928|二个|二个|二个| 2 |
| 宠物|电话:1687937005|二个|一个|二个| 2 |
| 宠物|1688109281|六个|二个|六个| 6 |
fiddle

相关问题