PCRE Regex -将所有内容匹配到第一个未用方括号括起来的管道

5tmbdcev  于 2023-05-08  发布在  其他
关注(0)|答案(1)|浏览(113)

我有下面的一行文本,我试图提取所有内容,直到第一个管道字符没有包含在方括号中。

action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$" | stats values(savedsearch_name) AS search_name

预期输出:

action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$"

即,除了尾随的| stats values(savedsearch_name) AS search_name以外的所有内容
通过一些示例,我可以(几乎)使用JavaScript Regex表达式获得所需的内容
/.*\|(?![^\[]*\])/g
但是这并不能很好地转换为一个兼容PCRE的表达式(另外,我想捕获第一个管道之前的所有内容,但不包括第一个管道)。
从我所读到的内容来看,第一个方括号中的嵌套方括号可能是一个无法解决的复杂问题?在任何给定集合中将仅存在一个嵌套括号级别(例如,[..[]..][..[]..[]..]
我承认,我不认为我已经得到了我的头完全围绕积极和消极的lookarounds,但任何帮助将不胜感激!

rkue9o1l

rkue9o1l1#

在这种情况下,匹配所有不是分隔符的内容比尝试拆分更有效:

(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*

demo
详细内容:

(?=[^|]) # lookahead: ensure there's at least one non pipe character at the
         # current position, the goal is to avoid empty match.
[^][|]* # all that isn't a bracket or a pipe
(?:
    (  # open the capture group 1: describe a bracket part
        \[
         [^][]*+ # all that isn't a bracket (note that you don't have to care
                 # about of the pipe here, you are between brackets)
         (?:
             (?1)  # refer to the capture group 1 subpattern (it's a recursion
                   # since this reference is in the capture group 1 itself)
             [^][]* 
         )*+
         ]
    ) # close the capture group 1
    [^][|]*
)*

如果你也需要空的部分,你可以这样重写:

(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*|(?<=\|)

相关问题