Oracle Regex查询

whitzsjs  于 2022-10-04  发布在  Oracle
关注(0)|答案(2)|浏览(107)

我正在处理下面的正则表达式。我需要提取“no”之后的子字符串。但我有一个问题,在下面的4个和5个ID,其中预期输出是10A-20A和10-20,但实际上我得到的是10A和10。

with test (id, col) as (
  select 1, 'JOHN NO.10A 1234 TEST1'  from dual union all
  select 2, 'JOHN NO.10 1234 TEST1'  from dual union all
  select 3, 'JOHN NO.10A 1234 TEST1'  from dual union all
  select 4, 'JOHN NO.10A TEST1 - JOHN NO.20A TEST2'      from dual union all
  select 5, 'JOHN NO.10 TEST1 - JOHN NO.20 TEST2'      from dual union all
  select 6, 'JOHN NO.10A TEST1234'  from dual union all
   select 7, 'JOHN 10A TEST1234'   from dual
)
select
  id,
  col,
  regexp_substr(col, '(NO.)(S*)(s?)',1,1,'i',2) as qwe
from test

ID|Col|预期
-|-|
1|约翰10A 1234测试1|10A
2|约翰10号1234测试1|10
3|约翰10A 1234测试1|10A
4|John No.10A Test1-John No.20A TEST2|10A-20A
5|约翰10号测试1-约翰20号测试2|10-20
6|John NO.10A TEST1234|10A
7|John 10A TEST1234|空

Db<>小提琴here

whlutmcx

whlutmcx1#

您可以使用递归CTE或分层查询来拆分要查找的模式的多个匹配项:

select
  id,
  col,
  level as lvl,
  regexp_substr(col, '(NO.)(S*)(s?)',1,level,'i',2) as qwe
from test
connect by id = prior id
and prior dbms_random.value is not null
and level <= regexp_count(col, '(NO.)(S*)(s?)')

ID|Col|lvl|QWE
-|-|
1|John NO.10A 1234测试1|1|10A
2|约翰10号1234测试1|1|10
3|约翰10A 1234测试1|1|10A
4|John No.10A Test1-John No.20A TEST2|1|10A
4|John No.10A Test1-John No.20A TEST2|2|20A
5|约翰10号测试1-约翰20号测试2|1|10
5|约翰10号测试1-约翰20号测试2|2|20
6|John NO.10A TEST1234|1|10A
7|John 10A TEST1234|1|

..。然后通过一个子查询将它们聚合回每个ID的单个字符串:

select
  id,
  col,
  listagg(qwe, ' - ') within group (order by lvl) as qwe
from (
  select
    id,
    col,
    level as lvl,
    regexp_substr(col, '(NO.)(S*)(s?)',1,level,'i',2) as qwe
  from test
  connect by id = prior id
  and prior dbms_random.value is not null
  and level <= regexp_count(col, '(NO.)(S*)(s?)')
)
group by id, col

ID|Col|QWE
-|-|
1|约翰10A 1234测试1|10A
2|约翰10号1234测试1|10
3|约翰10A 1234测试1|10A
4|John No.10A Test1-John No.20A TEST2|10A-20A
5|约翰10号测试1-约翰20号测试2|10-20
6|John NO.10A TEST1234|10A
7|John 10A TEST1234|

或直接执行以下操作:

-- with connect-by to get multiple values and listagg to join them
select
  id,
  col,
  listagg(regexp_substr(col, '(NO.)(S*)(s?)',1,level,'i',2), ' - ')
    within group (order by level) as qwe
from test
connect by id = prior id
and prior dbms_random.value is not null
and level <= regexp_count(col, '(NO.)(S*)(s?)')
group by id, col

ID|Col|QWE
-|-|
1|约翰10A 1234测试1|10A
2|约翰10号1234测试1|10
3|约翰10A 1234测试1|10A
4|John No.10A Test1-John No.20A TEST2|10A-20A
5|约翰10号测试1-约翰20号测试2|10-20
6|John NO.10A TEST1234|10A
7|John 10A TEST1234|

fiddle

您已经用多个数据库版本标记了您的查询;如果您真的使用10g(或11gr1),那么listagg()将不可用,但有其他用于字符串聚合的选项。或者,如果您知道最多只有两个示例,您可以将它们单独取出并连接起来--只需工作即可仅在需要时拥有' - '连接器。

1mrurvl1

1mrurvl12#

with test (id, col) as (
    select 1, 'JOHN NO.10A 1234 TEST1' from dual union all 
    select 2, 'JOHN NO.10 1234 TEST1' from dual union all 
    select 3, 'JOHN NO.10A 1234 TEST1' from dual union all 
    select 4, 'JOHN NO.10A TEST1 - JOHN NO.20A TEST2' from dual union all 
    select 5, 'JOHN NO.10 TEST1 - JOHN NO.20 TEST2' from dual union all 
    select 6, 'JOHN NO.10A TEST1234' from dual union all select 7, 'JOHN 10A TEST1234' from dual 
),
ctest as (
    select id, col, regexp_count(col, '(NO.)(S*)(s?)') as c
    from test
)
select id, col, decode(c, 0, null, regexp_substr(col, '(NO.)(S*)(s?)', 1 , c, 'i' ))  as qwe
from ctest
;

相关问题