基于文本匹配和值抽取的PostgreSQL表SLA信息提取

4zcjmb1e 于 2023-04-11 发布在 PostgreSQL

关注(0)|答案(2)|浏览(99)

我有以下的postgresql：

CREATE TABLE test (
  id INT,
  description TEXT
);

INSERT INTO test VALUES 
(1, 'Some text'),
(2, '123 blabla The&nbsp;average processing time for this type of request is 5 days. blabla'),
(3, 'blalbla The&nbsp;average processing time for this type of request is 5 days.
This delay is reduced to 1 day for requests with high or critical priority. blabla'),
(4, 'blalbla The average processing time for this type of request is 7 days.
This delay is reduced to 2 day for requests with high or critical priority. blabla'),
(5, 'blabla The average processing time for this type of request is 3 days. blabla');

我需要得到以下输出：
| ID|二语习得文本|天|
| --------------|--------------|--------------|
| 1|不想||
| 二|是的|五|
| 三|是的|五分之一|
| 四|是的|2/7|
| 五|是的|三|
目前，我可以使用以下命令判断该字段是否包含SLA值：

SELECT 
    id,
    CASE 
        WHEN regexp_replace(description, '&nbsp;', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END AS "SLA text"
FROM 
    test

我需要检查字段description是否包含以下文本之一：

此类请求的平均处理时间为5天。
此类请求的平均处理时间为5天。对于高优先级或关键优先级的请求，此延迟缩短为1天。

如果存在任一文本，则将该字段（SLA文本）标记为“是”，否则将其标记为“否”。
如果字段标记为“是”，则从文本中检索整数值。
例如，如果文本为blabla The average processing time for this type of request is 5 days. blabla，则要检索的值为5。如果文本为The average processing time for this type of request is 5 days. This delay is reduced to 1 day for requests with high or critical priority.，则要检索的值为1/5。此检索值应存储在days列中。
演示：https://www.db-fiddle.com/f/iNxLeZosApNzTyp9RNTK4r/1

postgresql

来源：https://stackoverflow.com/questions/75896090/extracting-sla-information-from-postgresql-table-with-text-matching-and-value-ex

2条答案

按热度按时间

0pizxfdo1#

你也可以使用regex_replace来实现：

select id, "SLA text",
    case when "SLA text" = 'Yes' then
        trim(leading '/' from regexp_replace(text,'(?:.*The average processing time for this type of request is (\d+) days\.)(?:
This delay is reduced to (\d+) day for requests with high or critical priority.)?.*' , '\2/\1', 'g'))
    else '' end "SLA2 text"
from(
  SELECT 
      id,
      CASE 
          WHEN regexp_replace(description, '&nbsp;', ' ', 'g') ILIKE '%The average processing%' 
          THEN 'Yes'
          ELSE 'No'
      END AS "SLA text",
      regexp_replace(description, '&nbsp;', ' ', 'g') text
  FROM 
      test
) t

在这里，我们用从这个模式中找到的数字替换你的模式。当第二个数字丢失时，修剪第一个斜线。

赞(0）回复(0）举报 2023-04-11

fruv7luv2#

可以将substirng与正则表达式一起使用

SELECT 
    id,
    CASE 
        WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END AS "SLA text",
  substring(description, '([0-9]*) day') days
FROM 
    test
where substring(description, '([0-9]*) day') IS NULL

身份证	二语习得文本	天
1	不想	联系我们

SELECT 1

SELECT 
    id,
    CASE 
        WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END AS "SLA text",
  substring(description, '([0-9]*) day') days
FROM 
    test
where substring(description, '([0-9]*) day') IS NULL
UNION ALL
select 
  id,
    MAX(CASE 
        WHEN regexp_replace(description, ' ', ' ', 'g') ILIKE '%The average processing%' 
        THEN 'Yes'
        ELSE 'No'
    END) AS "SLA text",  
  STRING_AGG(match[1], '/' ORDER BY match[1]) as days
from test
cross join lateral regexp_matches(description, '([0-9]*) day', 'g') as match
Group by id

身份证	二语习得文本	天
1	不想	联系我们
二	是的	五
三	是的	五分之一
四	是的	2/7
五	是的	三

SELECT 5

fiddle

赞(0）回复(0）举报 2023-04-11

我来回答

基于文本匹配和值抽取的PostgreSQL表SLA信息提取

2条答案

相关问题

热门标签

最新问答