删除Oracle中逗号分隔字符串中的重复值

2ic8powd  于 2023-01-25  发布在  Oracle
关注(0)|答案(4)|浏览(757)

我需要你的帮助来使用regexp_replace函数。我有一个表,其中有一列包含重复的串接字符串值。我如何消除它们?
示例:

Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha

我需要输出为

Ian,Beatty,Larry,Neesha

复制品是随机的,没有任何特定的顺序。
最新消息
这是我的table的外观

ID   Name1   Name2    Name3     
1     a       b         c
1     c       d         a
2     d       e         a
2     c       d          b

我需要每个ID占一行,其中name1、name2、name3是不同的,以逗号分隔的字符串形式存在于一行中。

ID    Name
1     a,c,b,d,c
2     d,c,e,a,b

我试过使用listagg和distinct,但是我不能删除重复的。

ulydmbyx

ulydmbyx1#

我觉得最简单的选择-

SELECT ID, LISTAGG(NAME_LIST, ',')
  FROM (SELECT ID, NAME1 NAME_LIST FROM DATA UNION
        SELECT ID, NAME2 FROM DATA UNION
        SELECT ID, NAME3 FROM DATA
      )
GROUP BY ID;

Demo.

jdgnovmf

jdgnovmf2#

所以,试试这个...

([^,]+),(?=.*[A-Za-z],[] ]*\1)
lmyy7pcs

lmyy7pcs3#

如果重复的值不是相邻的,我不认为仅仅使用regexp_replace就可以做到这一点,一种方法是将值拆分,消除重复,然后将它们放回一起。
标记分隔字符串的常用方法是使用regexp_substrconnect by子句,在字符串中使用绑定变量可以使代码更清晰:

var value varchar2(100);
exec :value := 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha';

select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null;

VALUE                        
------------------------------
Ian                           
Beatty                        
Larry                         
Neesha                        
Beatty                        
Neesha                        
Ian                           
Neesha

您可以将其用作子查询(或CTE),从中获取不同的值,然后使用listagg重新组装它:

select listagg(value, ',') within group (order by value) as value
from (
  select distinct value from (
    select regexp_substr(:value, '[^,]+', 1, level) as value
    from dual
    connect by regexp_substr(:value, '[^,]+', 1, level) is not null
  )
);

VALUE                        
------------------------------
Beatty,Ian,Larry,Neesha

如果你在一个表中查看多行,这就有点复杂了,因为这会混淆connect-by语法,但是你可以使用一个非确定性的引用来避免循环:

with t42 (id, value) as (
  select 1, 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha' from dual
  union all select 2, 'Mary,Joe,Mary,Frank,Joe' from dual
)
select id, listagg(value, ',') within group (order by value) as value
from (
  select distinct id, value from (
    select id, regexp_substr(value, '[^,]+', 1, level) as value
    from t42
    connect by regexp_substr(value, '[^,]+', 1, level) is not null
    and id = prior id
    and prior dbms_random.value is not null
  )
)
group by id;

        ID VALUE                        
---------- ------------------------------
         1 Beatty,Ian,Larry,Neesha       
         2 Frank,Joe,Mary

当然,如果您正确地存储了关系数据,这就没有必要了;在列中使用分隔字符串不是一个好主意。

dldeef67

dldeef674#

在这种情况下,有一种方法可以找到重复项,但如果每个id在一个字符串中有多个重复的名称,则删除它们会有问题。下面是可以处理每个id一个重复项的代码。
样本数据:

WITH
    tbl AS
        (
            Select 1 "ID", 'a' "NAME_1", 'b' "NAME_2", 'c' "NAME_3" From Dual Union All
            Select 1 "ID", 'c' "NAME_1", 'd' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'd' "NAME_1", 'e' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'c' "NAME_1", 'd' "NAME_2", 'b' "NAME_3" From Dual 
        ),
    lists AS
        (
            Select 1 "ID", 'a,c,b,d,c' "NAME" From Dual Union All
            Select 2 "ID", 'd,c,e,a,b' "NAME" From Dual  
        ),

创建CTE,将您的LISTAGG ststring与原始数据进行比较,查找重复值:

grid AS
    (
        Select DISTINCT l.ID, l.NAME,
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_1 || ',', '')) ) / Length(t.NAME_1 || ',') > 1 THEN NAME_1 END  "NAME_1",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_2 || ',', '')) ) / Length(t.NAME_2 || ',') > 1 THEN NAME_2 END  "NAME_2",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_3 || ',', '')) ) / Length(t.NAME_3 || ',') > 1 THEN NAME_3 END  "NAME_3"
        From
            lists l
        Inner Join
            tbl t ON(t.ID = l.ID) 
    )

        ID NAME      NAME_1 NAME_2 NAME_3
---------- --------- ------ ------ ------
         2 d,c,e,a,b                      
         1 a,c,b,d,c c                    
         1 a,c,b,d,c               c

主SQL使用Union在发现重复的地方构建新字符串(删除第二次出现的字符串),然后在与旧字符串比较后放置新字符串。

SELECT DISTINCT l.ID, Nvl(g.NAME, l.NAME) NAME
FROM
    lists l
LEFT JOIN
    (
        SELECT ID,  CASE  WHEN NAME_1 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_1, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_1, 1, 2) + Length(NAME_1)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_2 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_2, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_2, 1, 2) + Length(NAME_2)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_3 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_3, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_3, 1, 2) + Length(NAME_3)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    ) g ON(g.ID = l.ID And Length(g.NAME) < Length(l.NAME))

R e s u l t :
        ID NAME         
---------- -------------
         2 d,c,e,a,b    
         1 a,c,b,d

对于一个字符串中的多个示例或多个不同的名称,应该进行一些递归或多个嵌套来完成它...

相关问题