返回整个匹配而不是捕获组(REGEXP BIG QUERY)

t5fffqht  于 2023-10-22  发布在  其他
关注(0)|答案(3)|浏览(88)

我是正则表达式和大型查询的新手,我想从字符串中提取年龄范围,将其定义为表中的变量,所以我做了这样的事情:

WITH example AS (
SELECT
"Bgfks Falda Tutú De Tul De 5 Capas Para Niñas Con Moño Para El Pelo, Falda Tutú Para Niños - 8-15 Years Old - Rosado" AS name
)
SELECT REGEXP_SUBSTR(LOWER(name),r"[0-9]{1,2}( - | a | to |-|a|to)[0-9]{1,2}") as AGE_RANGE FROM example

有多个组(-|一|到|-|一|到),因为这些是可以存在于数据“1-10”、“1 - 10”、“1到10”、“1到10”、.
但这只是返回捕获组“-”,我想要的是“8-15”。谁能帮我弄清楚我做错了什么?
我检查了一个页面中的正则表达式,我认为我得到的是“组1”,我想要的是“匹配1”。

4c8rllxm

4c8rllxm1#

你可以用一个非捕获组来实现这一点。BigQuery使用re2正则表达式语法,非捕获组用(?:re)表示:

REGEXP_SUBSTR(LOWER(name),r"[0-9]{1,2}(?: - | a | to |-|a|to)[0-9]{1,2}")

输出量:

文档:re2 Syntax reference

mm9b1k5b

mm9b1k5b2#

您可以在下面使用
REGEXP_EXTRACT(name,r '(?i)\d+?(?:-|一|至)?\d +')

nwlls2ji

nwlls2ji3#

如果您的目标是从文本中提取年龄范围-您可以使用下面的方法,使用ML.GENERATE_TEXT函数,该函数允许您通过使用Vertex AI text-bison自然语言基础模型对存储在BigQuery表中的文本执行生成自然语言任务

CREATE TEMP FUNCTION EXTRACT_AGE(text STRING) AS ((
SELECT TRIM(STRING(ml_generate_text_result['predictions'][0]['content']), '" ')
  FROM ML.GENERATE_TEXT(MODEL `your_project.your_dataset.your_model_llm`,
  (SELECT FORMAT('Extract age range from following text - %s. Return ONLY range in form of start age-end age', text)  AS prompt)
)));
SELECT name, EXTRACT_AGE(name) AS AGE_RANGE
FROM your_table

如果应用于以下样本数据

CREATE TEMP FUNCTION EXTRACT_AGE(text STRING) AS ((
SELECT TRIM(STRING(ml_generate_text_result['predictions'][0]['content']), '" ')
  FROM ML.GENERATE_TEXT(MODEL `your_project.your_dataset.your_model_llm`,
  (SELECT FORMAT('Extract age range from following text - %s. Return ONLY range in form of start age-end age', text)  AS prompt)
)));
WITH example AS (
  SELECT "Falda Tutú Para Niños - 8-15 Years Old - Rosado" AS name UNION ALL
  SELECT "Falda Tutú Para Niños - 8 - 15 Years Old - Rosado" AS name UNION ALL
  SELECT "Falda Tutú Para Niños - 8 to 15 Years Old - Rosado" AS name UNION ALL
  SELECT "Falda Tutú Para Niños - 8TO15 Years Old - Rosado" AS name UNION ALL
  SELECT "Falda Tutú Para Niños - 8a15 Years Old - Rosado" AS name
)
SELECT name, EXTRACT_AGE(name) AS AGE_RANGE
FROM example

与输出

相关问题