regex 正则表达式匹配第二次出现(如果不是第一次出现)

我在Zabbix中有一个json输出，如下所示

{
  "body": {
    "metricsArray": [
      {
        "name": "free-aa-bb2-123x123Profiles",
        "units": "profiles",
        "value": 14
      },
      {
        "name": "free_aa_bb2_123x123Profiles",
        "units": "profiles",
        "value": 14
      }
    ],
    "name": "regionxxx",
    "timeStamp": "2022-01-20T04:58:29.875Z"
  }
}

我使用了这个正则表达式：

"free[_-]aa[_-]bb2[_-]123x123Profiles"[^}]*

希望得到的输出为

"free_aa_bb2_123x123Profiles","units":"profiles","value":14

如果free-aa-bb2-123x123Profiles和free_aa_bb2_123x123Profiles两者都存在，
或者：

"free-aa-bb2-123x123Profiles","units":"profiles","value":14

如果只存在free-aa-bb2-123x123Profiles，则该值为0。
或者：

"free_aa_bb2_123x123Profiles","units":"profiles","value":14

如果free_aa_bb2_123x123Profiles存在。
但我得到的输出总是：

"free-aa-bb2-123x123Profiles","units":"profiles","value":14

短暂性脑缺血发作

从使用正则表达式获得所需匹配的Angular 来看，这是一个有趣的问题，尽管最好将JSON字符串转换为散列并从散列开始。
下面的正则表达式将匹配零个，一个或两个子字符串。如果至少有一个匹配，则第一个匹配将是感兴趣的那个。（如果有两个匹配，则忽略第二个。）
在问题中给出的例子中，我假设"name"（"free-aa-bb2-123x123Profiles"和"free-aa-bb2-123x123Profiles"）的值是由四个子串组成的字符串的占位符，其中一个子串由连字符分隔，另一个子串由下划线分隔，子串由单词字符组成（字母，数字和下划线，在正则表达式中由\w+表示）。
我还进一步假设，如果除了连字符被下划线替换之外，后面没有相同的“下划线”散列表示，则“连字符”散列表示是感兴趣的散列表示（因此是第一个匹配）;否则，下划线哈希表示是唯一的匹配。在示例中，哈希表示"free-aa-bb2-123x123Profiles"将被选择。然而，如果该字符串被更改为"zzzz-aa-bb2-123x123Profiles"，则"free-aa-bb2-123x123Profiles"将是第一个匹配，因此它将被选择。
请注意，Zabbix uses the PCRE regex engine。
您可以匹配下面的正则表达式，我已经在 * 扩展模式 * 下编写了它（使用x标志调用），有时称为 * 自由间距模式 *。该模式允许输入注解以使表达式自文档化，以及额外的空格以提高可读性。在这种模式下，正则表达式引擎在解析表达式之前会删除注解和空格。此外，在以下情况下，有必要保护表达式中的任何空格扩展模式不使用。这通常是通过在字符类（[ ]）中放置空格来完成的，这就是我在下面所做的，或者转义空格（\）。
我还调用了 single-line（或 DOTALL）模式（使用s标志调用），这会导致.匹配所有字符（如果不这样做，.不会匹配行终止符）。

\{\s+"name":[ ]"     # match '{' then 1+ whitespace chars, then '"name": "
(\w+)                # match 1+ word chars, save to capture group 1
-                    # match '-'
(\w+)                # match 1+ word chars, save to capture group 2
-                    # match '-'
(\w+)                # match 1+ word chars, save to capture group 3
-                    # match '-'
(\w+)                # match 1+ word chars, save to capture group 4
(                    # begin capture group 5
  ",\s+"units":[ ]   # match '",', then 1+ whitespaces then '"units": '
  "\w+"              # match 1+ word chars
  ,\s+"value":[ ]    # match ',', then 1+ whitespaces then '"value": '
  \d+                # match 1+ digits
  \s+\}              # match 1+ whitespaces then '"units": '
)                    # end capture group 5
(?!                  # begin negative lookahead
  .*                 # match 0+ chars
  \{\s+"name":[ ]"   # match '{' then 1+ whitespace chars then '"name": ' then '"'
  \1_                # match contents of capture group 1 then '_'
  \2_                # match contents of capture group 2 then '_'
  \3_                # match contents of capture group 3 then '_'
  \4                 # match contents of capture group 4
  \5                 # match contents of capture group 5
)                    # end of negative lookahead 
|                    # or
\{\s+"name":[ ]"     # match '{' then 1+ whitespace chars, then '"name": "
\w+                  # match 1+ word chars
_                    # match '_'
\w+                  # match 1+ word chars
_                    # match '_'
\w+                  # match 1+ word chars
_                    # match '_'
\w+                  # match 1+ word chars
(?5)                 # execute code constructing capture group 6

Demo
在上面的\1（例如）中，要求捕获组1的内容在当前字符串位置匹配。相比之下，(?5)指示用于捕获捕获组5的内容的代码在当前字符串位置调用。这称为regex subroutine或 * 子表达式 *。

regex 正则表达式匹配第二次出现(如果不是第一次出现)

1条答案

相关问题

热门标签

最新问答