基于文本匹配和值抽取的PostgreSQL表SLA信息提取

4zcjmb1e  于 2023-04-11  发布在  PostgreSQL
关注(0)|答案(2)|浏览(99)

我有以下的postgresql:

CREATE TABLE test (
  id INT,
  description TEXT
);

INSERT INTO test VALUES 
(1, 'Some text'),
(2, '123 blabla The average processing time for this type of request is 5 days. blabla'),
(3, 'blalbla The average processing time for this type of request is 5 days.
This delay is reduced to 1 day for requests with high or critical priority. blabla'),
(4, 'blalbla The average processing time for this type of request is 7 days.
This delay is reduced to 2 day for requests with high or critical priority. blabla'),
(5, 'blabla The average processing time for this type of request is 3 days. blabla');

我需要得到以下输出:
| ID|二语习得文本|天|
| --------------|--------------|--------------|
| 1|不想||
| 二|是的|五|
| 三|是的|五分之一|
| 四|是的|2/7|
| 五|是的|三|
目前,我可以使用以下命令判断该字段是否包含SLA值:

SELECT 
    id,
    CASE 
        WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END AS "SLA text"
FROM 
    test

我需要检查字段description是否包含以下文本之一:

  • 此类请求的平均处理时间为5天。
  • 此类请求的平均处理时间为5天。对于高优先级或关键优先级的请求,此延迟缩短为1天。

如果存在任一文本,则将该字段(SLA文本)标记为“是”,否则将其标记为“否”。
如果字段标记为“是”,则从文本中检索整数值。
例如,如果文本为blabla The average processing time for this type of request is 5 days. blabla,则要检索的值为5。如果文本为The average processing time for this type of request is 5 days. This delay is reduced to 1 day for requests with high or critical priority.,则要检索的值为1/5。此检索值应存储在days列中。
演示:https://www.db-fiddle.com/f/iNxLeZosApNzTyp9RNTK4r/1

0pizxfdo

0pizxfdo1#

你也可以使用regex_replace来实现:

select id, "SLA text",
    case when "SLA text" = 'Yes' then
        trim(leading '/' from regexp_replace(text,'(?:.*The average processing time for this type of request is (\d+) days\.)(?:
This delay is reduced to (\d+) day for requests with high or critical priority.)?.*' , '\2/\1', 'g'))
    else '' end "SLA2 text"
from(
  SELECT 
      id,
      CASE 
          WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
          THEN 'Yes'
          ELSE 'No'
      END AS "SLA text",
      regexp_replace(description, ' ', ' ', 'g') text
  FROM 
      test
) t

在这里,我们用从这个模式中找到的数字替换你的模式。当第二个数字丢失时,修剪第一个斜线。

fruv7luv

fruv7luv2#

可以将substirng与正则表达式一起使用

SELECT 
    id,
    CASE 
        WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END AS "SLA text",
  substring(description, '([0-9]*) day') days
FROM 
    test
where substring(description, '([0-9]*) day') IS NULL
身份证二语习得文本
1不想联系我们
SELECT 1
SELECT 
    id,
    CASE 
        WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END AS "SLA text",
  substring(description, '([0-9]*) day') days
FROM 
    test
where substring(description, '([0-9]*) day') IS NULL
UNION ALL
select 
  id,
    MAX(CASE 
        WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END) AS "SLA text",  
  STRING_AGG(match[1], '/' ORDER BY match[1]) as days
from test
cross join lateral regexp_matches(description, '([0-9]*) day', 'g') as match
Group by id
身份证二语习得文本
1不想联系我们
是的
是的五分之一
是的2/7
是的
SELECT 5

fiddle

相关问题