sql—将分隔字符串拆分为多个列和行

agxfikkp  于 2021-07-24  发布在  Java
关注(0)|答案(3)|浏览(281)

我在oracle12c中有一个数据集,它显示了用户通过网站点击的路径。路径是一个带分隔符的字符串,但我需要将各个步骤拆分为多行,其中from和to步骤显示为跨多行的两个独立列,这些行按原始路径、user和date键分组。
使用以下示例数据集:

SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL

输入数据如下所示:

User    Date        WebPath
USER_A  2020-08-07  |A|B|C
USER_B  2020-08-06  |A|B|C
USER_B  2020-08-07  |G|H|I|J|K

所以在2020-08-07,用户a从a点到b点,然后从b点到c点。
我需要输出看起来像下面的输出

USER    DATE        STARTPOINT  ENDPOINT     WEBPATH
USER_A  2020-08-07  A           B            |A|B|C
USER_A  2020-08-07  B           C            |A|B|C   
USER_B  2020-08-06  A           B            |A|B|C
USER_B  2020-08-06  B           C            |A|B|C
USER_B  2020-08-07  G           H            |G|H|I|J|K 
USER_B  2020-08-07  H           I            |G|H|I|J|K
USER_B  2020-08-07  I           J            |G|H|I|J|K
USER_B  2020-08-07  J           K            |G|H|I|J|K
d7v8vwbk

d7v8vwbk1#

您可以与递归子查询+regexp\u count+regexp\u substr一起使用:

select *
from t
    ,lateral(
       select
          level n
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level  ,null,1) s1
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level+1,null,1) s2
       from dual
       connect by level<regexp_count(WebPath,'\|[^|]+')
    ) x

完整示例:

with t(UserName, DateCol, WebPath) as (
SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL
)
select *
from t
    ,lateral(
       select
          level n
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level  ,null,1) s1
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level+1,null,1) s2
       from dual
       connect by level<regexp_count(WebPath,'\|[^|]+')
    ) x;

结果:

USERNAME   DATECOL    WEBPATH             N S1     S2
---------- ---------- ---------- ---------- ------ ------
USER_A     2020-08-07 |A|B|C              1 A      B
USER_A     2020-08-07 |A|B|C              2 B      C
USER_B     2020-08-06 |A|B|C              1 A      B
USER_B     2020-08-06 |A|B|C              2 B      C
USER_B     2020-08-07 |G|H|I|J|K          1 G      H
USER_B     2020-08-07 |G|H|I|J|K          2 H      I
USER_B     2020-08-07 |G|H|I|J|K          3 I      J
USER_B     2020-08-07 |G|H|I|J|K          4 J      K

8 rows selected.
xvw2m8pv

xvw2m8pv2#

可以使用普通递归子查询分解:

with rcte (usr, dt, path, pos, startpoint, endpoint) as (
  select usr, dt, path, 1,
    regexp_substr(path, '(.*?)(\||$)', 1, 2, null, 1),
    regexp_substr(path, '(.*?)(\||$)', 1, 3, null, 1)
  from your_table
  union all
  select usr, dt, path, pos + 1,
    regexp_substr(path, '(.*?)(\||$)', 1, pos + 2, null, 1),
    regexp_substr(path, '(.*?)(\||$)', 1, pos + 3, null, 1)
  from rcte
  where regexp_substr(path, '(.*?)(\||$)', 1, pos + 3, null, 1) is not null
)
select usr, dt, startpoint, endpoint, path
from rcte
order by usr, dt, pos

用你的数据可以得到:

USR    | DT        | STARTPOINT | ENDPOINT | PATH      
:----- | :-------- | :--------- | :------- | :---------
USER_A | 07-AUG-20 | A          | B        | |A|B|C    
USER_A | 07-AUG-20 | B          | C        | |A|B|C    
USER_B | 06-AUG-20 | A          | B        | |A|B|C    
USER_B | 06-AUG-20 | B          | C        | |A|B|C    
USER_B | 07-AUG-20 | G          | H        | |G|H|I|J|K
USER_B | 07-AUG-20 | H          | I        | |G|H|I|J|K
USER_B | 07-AUG-20 | I          | J        | |G|H|I|J|K
USER_B | 07-AUG-20 | J          | K        | |G|H|I|J|K

db<>小提琴

42fyovps

42fyovps3#

我的甲骨文技能有些生疏,但有一个想法可能有点太老套了: duplicator 返回数字 point_idx = 1, 2, ..., max_path_size 哪里 max_path_size 从计数中找到 | 中的字符 webpath .
在主查询中,您使用replicator对输入数据进行笛卡尔积,因此您可以使用不同的 point_idx 值从1到 max_path_size . 用这个来获取电流( point_idx a)子串 webpath 下一个呢( point_idx+ 1 th)要打印的子字符串。

WITH
input (user, date, webpath) AS (
  SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL UNION
  SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL UNION
  SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL
),
duplicator(point_idx) AS (
  SELECT LEVEL
  FROM DUAL 
  CONNECT BY LEVEL <= (
    SELECT MAX(REGEXP_COUNT(webpath, '|')) m
    FROM input
  ) 
)
SELECT 
  input.*, 
  REGEXP_SUBSTR(webpath, '(\|)([^|]*)', 1, point_idx ,    'ix', 1)) AS startpoint, 
  REGEXP_SUBSTR(webpath, '(\|)([^|]*)', 1, point_idx + 1, 'ix', 1)) AS endpoint
FROM input, duplicator
WHERE point_idx <= REGEXP_COUNT(webpath, '|');

相关问题