oracle跟踪带有时间戳列的表的历史记录

dy2hfwbg  于 2021-07-24  发布在  Java
关注(0)|答案(3)|浏览(360)

我在oracle12c数据库中有两个表,其结构如下。表a中有来自应用程序的传入数据,这些数据带有修改过的日期时间戳,每天我们可能会在表a中得到大约50000行。目标是使用表a的数据,并通过使用表a的数据作为驱动数据集插入到最终的目标表b(通常有数十亿行)。
只有当传入数据集属性发生更改时,才需要在表b中插入/合并记录。基本上,它的目的是仅当给定产品的属性(如state和zip)发生更改时,才使用有效的时间戳跟踪该产品的历史/旅程。
见下表

Table A  ( PRODUCT_ID, STATE, ZIP_CD, Modified_dt)

           'abc',  'MN', '123', '3/5/2020 12:01:00 AM'
           'abc',  'MN', '123', '3/5/2020  6:01:13 PM'
           'abc',  'IL', '223', '3/5/2020  7:01:15 PM'
           'abc',  'OH', '333', '3/5/2020  6:01:16 PM'
           'abc',  'NY', '722', '3/5/2020  4:29:00 PM' 
           'abc',  'KS', '444', '3/5/2020  4:31:41 PM'    
           'bbc',  'MN', '123', '3/19/2020 2:47:08 PM' 
           'bbc',  'IL', '223', '3/19/2020 2:50:37 PM' 
           'ccb',  'MN', '123', '3/21/2020 2:56:24 PM'
           'dbd',  'KS', '444', '6/20/2020 12:00:00 AM'  

Target Table B  (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To,  LATEST_FLAG)
                '1',    'abc',    'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM',   'N'
                '2',    'abc',    'AR', '555', '3/3/2020 6:01:14  PM',  '3/3/2020 6:01:14 PM',  'N'
                '3',    'abc',    'CA', '565', '3/3/2020 6:01:15  PM',  '3/4/2020 4:28:59 PM',  'N'
                '4',    'abc',    'CA', '777', '3/4/2020 4:29:00  PM',  '12/31/2099',           'Y'
                '5',    'bbc',    'MN', '123', '3/4/2020 4:31:41  PM',  '3/19/2020 2:47:07 PM', 'N'
                '6',    'bbc',    'MN', '666', '3/18/2020 2:47:08 PM',  '3/19/2020 2:50:36 PM', 'N'
                '7',    'bbc',    'MN', '777', '3/18/2020 2:50:37 PM',  '12/31/2099',         , 'Y'
                '8',    'ccb',    'MN', '123', '3/20/2020 2:56:24 PM',  '12/31/2099',           'Y'

将数据填充到表b的规则:
输出表上的主键是product\ id和valid\ from字段。来自表a的传入数据将始终具有比现有表更大的修改dt时间戳。
为了插入数据,我们必须比较来自目标表b的最新\u flag='y'记录和来自表a的传入数据,并且只有在属性状态和zip \u cd发生更改时,才需要从表a将记录插入表b。valid\ to column是一个计算字段,它总是比下一行的valid from date低1秒,对于最新一行,它的默认值为“12/31/2099”。类似地,latest\u flag列是一个计算的列,它指示给定产品标识的当前行
在传入的数据集中,如果有多个行与表b中的前一行或现有数据(latest_flag='y')相比没有任何更改,那么也应该忽略这些行。例如,表a中的第2行和第9行被忽略,因为与该产品的前几行相比,属性状态zip\u cd没有变化。
基于上述规则,我需要将表a数据合并到表b中,最终输出如下所示

Table B  (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To,  LATEST_FLAG)
                '1',    'abc',    'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM',  'N'
                '2',    'abc',    'AR', '555', '3/3/2020 6:01:14  PM'  '3/3/2020 6:01:14 PM',  'N'
                '3',    'abc',    'CA', '565', '3/3/2020 6:01:15  PM'  '3/4/2020 4:28:59 PM',  'N'
                '4',    'abc',    'CA', '777', '3/4/2020 4:29:00  PM'  '3/5/2020 12:00:00 AM', 'N'
                '5',    'abc',    'MN', '123', '3/5/2020 12:01:00 AM', '3/5/2020 7:01:14  PM', 'N'
                '6',    'abc',    'IL', '223'  '3/5/2020  7:01:15 PM', '3/5/2020 6:01:15 PM',  'N'
                '7',    'abc',    'OH', '333', '3/5/2020  6:01:16 PM', '3/5/2020 4:28:59 PM',  'N'
                '8',    'abc',    'NY', '722', '3/5/2020  4:29:00 PM', '3/5/2020 4:31:40  PM', 'N'
                '9',    'abc',    'KS', '444', '3/5/2020  4:31:41 PM', '12/31/2099',           'Y'
                '10',   'bbc',    'MN', '123', '3/4/2020 4:31:41  PM'  '3/19/2020 2:47:07 PM', 'N'
                '11',   'bbc',    'MN', '666', '3/18/2020 2:47:08 PM'  '3/19/2020 2:50:36 PM', 'N'
                '12',   'bbc',    'MN', '777', '3/18/2020 2:50:37 PM'  '3/19/2020 2:47:07 PM', 'N'
                '13',   'bbc',    'MN', '123', '3/19/2020 2:47:08 PM'  '3/19/2020 2:50:36 PM', 'N'
                '14',   'bbc',    'IL', '223', '3/19/2020 2:50:37 PM'  '12/31/2099',           'Y'
                '15',   'ccb',    'MN', '123', '3/20/2020 2:56:24 PM'  '12/31/2099',           'Y'
                '16',   'dbd',    'KS', '444', '6/20/2020 12:00:00 AM' '12/31/2099',           'Y'

寻找解决这个问题的建议。live sql链接:
https://livesql.oracle.com/apex/livesql/s/kfbx7dwzr3zz28v6eigv0ars0
谢谢您。

tyu7yeag

tyu7yeag1#

我试着看看如何在sql中做到这一点,但由于逻辑以及您在所需输出中的顺序键重置,这对我来说是不可能的。
所以,这里是我在pl/sql中的建议

SQL> select * from table_a ;

PRODUCT_ID                     STATE                          ZIP_CD                         MODIFIED_
------------------------------ ------------------------------ ------------------------------ ---------
abc                            MN                             123                            05-MAR-20
abc                            MN                             123                            05-MAR-20
abc                            IL                             223                            05-MAR-20
abc                            OH                             333                            05-MAR-20
abc                            NY                             722                            05-MAR-20
abc                            KS                             444                            05-MAR-20
bbc                            MN                             123                            19-MAR-20
bbc                            IL                             223                            19-MAR-20
ccb                            MN                             123                            19-MAR-20
dbd                            KS                             444                            19-MAR-20

10 rows selected.

SQL> select * from table_b ;

SEQUENCE_KEY PRODUCT_ID                     STATE                          ZIP_CD                         VALID_FRO VALID_TO  L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
           1 abc                            AR                             999                            05-MAR-20 05-MAR-20 N
           2 abc                            AR                             555                            05-MAR-20 05-MAR-20 N
           3 abc                            CA                             565                            05-MAR-20 05-MAR-20 N
           4 abc                            CA                             777                            05-MAR-20 31-DEC-99 Y
           5 bbc                            MN                             123                            05-MAR-20 05-MAR-20 N
           6 bbc                            MN                             666                            05-MAR-20 05-MAR-20 N
           7 bbc                            MN                             777                            19-MAR-20 31-DEC-99 Y
           8 ccb                            MN                             123                            19-MAR-20 31-DEC-99 Y

8 rows selected.

现在,我使用了这段plu-sql代码

declare 
type typ_rec_set IS  RECORD
(
 PRODUCT_ID        VARCHAR2(30 CHAR),
 STATE             VARCHAR2(30 CHAR),
 ZIP_CD            VARCHAR2(30 CHAR),
 VALID_FROM        DATE             ,
 VALID_TO          DATE             ,
 LATEST_FLAG       VARCHAR2(1 CHAR) 
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab  typ_rec_tab;
begin
SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
      BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL 
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0  ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last 
LOOP
    -- begin block 
    begin
        insert into table_b 
        (
         sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG                 
        )
        values
        (
        ( select max(sequence_key)+1 from table_b ),
        l_hdr_tab(i).product_id ,
        l_hdr_tab(i).state ,
        l_hdr_tab(i).zip_cd ,
        l_hdr_tab(i).valid_from ,
        l_hdr_tab(i).valid_to ,
        l_hdr_tab(i).latest_flag 
        );
     end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select     sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG  ,
         row_number() over ( order by product_id,valid_from ) as new_seq 
          from table_b ) s
on ( s.rowid = t.rowid ) 
when matched then 
  update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/

然后我运行它

SQL> host cat proc.sql
declare
type typ_rec_set IS  RECORD
(
 PRODUCT_ID        VARCHAR2(30 CHAR),
 STATE             VARCHAR2(30 CHAR),
 ZIP_CD            VARCHAR2(30 CHAR),
 VALID_FROM        DATE             ,
 VALID_TO          DATE             ,
 LATEST_FLAG       VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab  typ_rec_tab;
begin
SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
      BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0  ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
    -- begin block
    begin
        insert into table_b
        (
         sequence_key ,
         PRODUCT_ID   ,
         STATE        ,
         ZIP_CD       ,
         VALID_FROM   ,
         VALID_TO     ,
         LATEST_FLAG
        )
        values
        (
        ( select max(sequence_key)+1 from table_b ),
        l_hdr_tab(i).product_id ,
        l_hdr_tab(i).state ,
        l_hdr_tab(i).zip_cd ,
        l_hdr_tab(i).valid_from ,
        l_hdr_tab(i).valid_to ,
        l_hdr_tab(i).latest_flag
        );
     end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select     sequence_key ,
         PRODUCT_ID   ,
         STATE        ,
         ZIP_CD       ,
         VALID_FROM   ,
         VALID_TO     ,
         LATEST_FLAG  ,
         row_number() over ( order by product_id,valid_from ) as new_seq
          from table_b ) s
on ( s.rowid = t.rowid )
when matched then
  update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/

SQL> @proc.sql

PL/SQL procedure successfully completed.

SQL> select * from table_b order by sequence_key ;

SEQUENCE_KEY PRODUCT_ID                     STATE                          ZIP_CD                         VALID_FRO VALID_TO  L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
           1 abc                            AR                             999                            05-MAR-20 05-MAR-20 N
           2 abc                            NY                             722                            05-MAR-20 05-MAR-20 N
           3 abc                            CA                             777                            05-MAR-20 31-DEC-99 Y
           4 abc                            KS                             444                            05-MAR-20 05-MAR-20 N
           5 abc                            MN                             123                            05-MAR-20 05-MAR-20 N
           6 abc                            AR                             555                            05-MAR-20 05-MAR-20 N
           7 abc                            CA                             565                            05-MAR-20 05-MAR-20 N
           8 abc                            OH                             333                            05-MAR-20 05-MAR-20 N
           9 abc                            IL                             223                            05-MAR-20 31-DEC-99 Y
          10 bbc                            MN                             666                            05-MAR-20 05-MAR-20 N
          11 bbc                            MN                             123                            05-MAR-20 05-MAR-20 N

SEQUENCE_KEY PRODUCT_ID                     STATE                          ZIP_CD                         VALID_FRO VALID_TO  L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
          12 bbc                            MN                             777                            19-MAR-20 31-DEC-99 Y
          13 bbc                            IL                             223                            19-MAR-20 31-DEC-99 Y
          14 ccb                            MN                             123                            19-MAR-20 31-DEC-99 Y
          15 dbd                            KS                             444                            19-MAR-20 31-DEC-99 Y

15 rows selected.

SQL>

你有什么疑问尽管告诉我。我知道我肯定错过了什么;)
更新
我意识到我在循环中有一个无用的操作,计算maxvalue为字段序列\ u键。我这里有一个更好的程序版本:

declare
type typ_rec_set IS  RECORD
(
 PRODUCT_ID        VARCHAR2(30 CHAR),
 STATE             VARCHAR2(30 CHAR),
 ZIP_CD            VARCHAR2(30 CHAR),
 VALID_FROM        DATE             ,
 VALID_TO          DATE             ,
 LATEST_FLAG       VARCHAR2(1 CHAR) 
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab  typ_rec_tab;
r    pls_integer := 1;
vseq pls_integer;
begin
-- calculate value sequence 
select max(sequence_key) into vseq from table_b ;
SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
      BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL 
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0  ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last 
LOOP
    -- begin block 
    vseq := vseq + r ;
    begin
        insert into table_b 
        (
         sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG                 
        )
        values
        (
        vseq ,
        l_hdr_tab(i).product_id ,
        l_hdr_tab(i).state ,
        l_hdr_tab(i).zip_cd ,
        l_hdr_tab(i).valid_from ,
        l_hdr_tab(i).valid_to ,
        l_hdr_tab(i).latest_flag 
        );
     end;
    r := r + 1; 
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select     sequence_key ,
         PRODUCT_ID   ,       
         STATE        ,       
         ZIP_CD       ,       
         VALID_FROM   ,       
         VALID_TO     ,       
         LATEST_FLAG  ,
         row_number() over ( order by product_id,valid_from ) as new_seq 
          from table_b ) s
on ( s.rowid = t.rowid ) 
when matched then 
  update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/
ycl3bljg

ycl3bljg2#

我会以我的理解力第一次尝试。作为插入tableb的源的游标如下所示,

SELECT product_id
      ,state
      ,zip_cd
      ,valid_from
      ,valid_to
      ,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
FROM
(
SELECT a.product_id
      ,a.state
      ,a.zip_cd
      ,a.modified_dt valid_from
      ,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt))  - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
      ,CASE
          WHEN ( (    b.product_id IS NOT NULL 
                  AND a.state != b.state
                  AND a.zip_cd != b.zip_cd)
                OR b.product_id IS NULL
               ) THEN
           1
          ELSE
           0
       END insert_flag
FROM   table_a a
LEFT OUTER JOIN   table_b b
ON     a.product_id = b.product_id
AND    b.latest_flag = 'Y'
WHERE  (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0;

left outer join检查记录是否存在于tableb中,where子句检查最新的\u标志“y”的修改日期是否大于有效的\u from
内部case语句将告诉我们属性是否更改,如果产品id不存在,它也会将其视为第一个条目,insert\u标志将为1
根据31-12-2099的“修改日期”列,在最后一条记录的情况下,outer case语句提供有效的\u to
关于第三点我还不完全清楚,但我相信案例陈述正是我们所需要的。
最后我没有考虑性能问题。您可以考虑将其转换为pl/sql块和其他收集方法来处理块中的数据。
这里还有一个问题,如果产品id为“dbd”(这是一个新条目,在tableb中不存在)的记录在tablea中多次出现,会发生什么?

35g0bw71

35g0bw713#

这就是数据仓库中的缓慢变化维度(scd)类型2问题(kimball方法)。你可以在这里看到一个简短的定义
https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/slowlychangingdimensions.htm
对scd类型2的支持仅在owb 10gr2的企业etl选项中提供,如上述链接所述。如果这不可用并且必须使用pl/sql,那么可以检查以下方法。不幸的是,oraclepl/sql不像mssql那样提供直接的解决方案。
在oracle中实现类型2 scd

相关问题