将带有html标记的列值转换为带有行和列的sql视图

rm5edbpk  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(312)

我有一个名为data的表,其中有一列desc\u data。此列的值如下:

<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span>.

我想解析这个列值,剥离html标记并使用sql查询(regexp\u replace maybe)将其拆分到一个新视图中,这样:所有标签值都变成列,即。 <span class ="label"> A & <span class ="label">B 将成为列,并且 <span class ="value">A-Class & <span class ="value">B-Class 将分别成为列值。
实际的数据要多得多,有很多标签和值,但这只是获取帮助的示例。预期结果应为:
查看数据\u查看

A          B
A-Class    B-Class
kxkpmulp

kxkpmulp1#

我认为以行而不是列的形式获取所需数据会方便得多。您可以使用xmltable解析它,只需对原始html稍加修改(删除未关闭的标记,如 <br> . 这就是为什么 <br/> 更好):

with t as (
  -- your sample data:
  select
    q'[<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span>.
  ]' html_data
from dual
)
-- main query:
select xt.*
from t
    ,xmltable(
      'let $labels := /root/span[@class eq "label"]
       let $values := /root/span[@class eq "value"]
       for $label at $i in $labels
          return element label {
             attribute name {$label/text()}, 
             attribute value {$values[$i]/text()}
          }
       '
      passing
      xmltype(
       --- modify your html to make it compatible with xml:
       '<root>'
       || replace(replace(t.html_data,'<br>'),'&nbsp;')
       ||'</root>'
      )
      columns
         n for ordinality,
         label_name path '@name',
         label_value path '@value'
    ) xt;

结果:

N LABEL_NAME                     LABEL_VALUE
---------- ------------------------------ ------------------------------
         1 A                              A-Class
         2 B                              B-Class
mepcadol

mepcadol2#

你需要用一些模式来分割你的字符串(例如 '/span> <span' )递归地。通过使用 REGEXP_REPLACE() 函数,然后应用旋转:

WITH t(desc_data) AS
(
 SELECT '<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span> <span class ="label">C</span><br> <span class ="value">C-Class</span>'
   FROM dual
), t2 AS
(
SELECT SUBSTR(desc_data,1,CASE WHEN INSTR(desc_data,'/span> <span',1,level) > 0
                               THEN INSTR(desc_data,'/span> <span',1,level) + 5 
                               ELSE LENGTH(desc_data)
                           END
             ) AS desc_data2
  FROM t
 CONNECT BY level <= REGEXP_COUNT(desc_data,'/span> <span') + 1
)
SELECT *
  FROM
  (
   SELECT REGEXP_REPLACE(desc_data2,'(.*"label">)(\S+)(</span>.*)','\2') AS label,
          REGEXP_REPLACE(desc_data2,'(.*"value">)(\S+)(</span>.*)','\2') AS value
     FROM t2 )
 PIVOT ( MAX(VALUE) FOR LABEL IN ('A' AS "A", 'B' AS "B", 'C' AS "C") );

A          B          C
-------    -------    -------
A-Class    B-Class    C-Class

演示

相关问题