regex BigQuery SQL中的多分隔符处理

lpwwtiir  于 2023-05-01  发布在  其他
关注(0)|答案(3)|浏览(98)

我需要bigquery sql来处理具有外层分隔符^G和内层分隔符空间的列,例如,如果输入列是a 11^Gb 22^Gc 33,那么我们应该给予两个输出列a^Gb^Gc11^G22^G33。谢谢!

oxalkeyp

oxalkeyp1#

你可以考虑下面

-- sample data
WITH sample_table AS (
 SELECT 'a 11^Gb 22^Gc 33' str UNION ALL
 SELECT 'a AA^Gb BB^Gc CC' str
)
-- query starts here
SELECT str,
       (SELECT AS STRUCT
               STRING_AGG(SPLIT(out, ' ')[OFFSET(0)], '^G') col1,
               STRING_AGG(SPLIT(out, ' ')[OFFSET(1)], '^G') col2
          FROM UNNEST(SPLIT(str, '^G')) out
       ).*
  FROM sample_table;

-- query result
+------------------+---------+------------+
|       str        |  col1   |    col2    |
+------------------+---------+------------+
| a 11^Gb 22^Gc 33 | a^Gb^Gc | 11^G22^G33 |
| a AA^Gb BB^Gc CC | a^Gb^Gc | AA^GBB^GCC |
+------------------+---------+------------+
mwyxok5s

mwyxok5s2#

另一个选择

select str, 
  array_to_string(regexp_extract_all(str, r'(?:([^\s]+) [^\s]*?)(?:\^G|$)'), '^G') as col1,
  array_to_string(regexp_extract_all(str, r'(?:[^\s]+ ([^\s]*?))(?:\^G|$)'), '^G') as col2
from your_table

输出如

arknldoa

arknldoa3#

您可以使用regex_replace()替换数字或小写字母,并使用replace()删除空格

select replace(regexp_replace(colname, '[0-9]', ''),' ','')column1, replace(regexp_replace(colname, '[a-z]', ''),' ','')column2

相关问题