regex 匹配注解，除非起始字符用未转义的引号括起来

xoshrz7s 于 2022-12-05 发布在其他

关注(0)|答案(1)|浏览(165)

使用正则表达式：如何匹配以分号开始的注解，除非分号两边都用未转义引号括起来，如下所示（绿色块表示匹配的注解）？：

请注意，数据引号可以通过将其加倍""来转义。这种转义的数据引号表现为完全不同的字符，即它们不能包围分号并禁用其注解开始功能。
另外，不对称的数据引号也被视为转义数据引号。
在Bubble的帮助下，我已经得到了下面的正则表达式，它不能正确处理最后一个测试向量行中的尾随转义dquote。

^(?>(?:""[^""\n]*""|[^;""\n]+)*)""?[^"";\n]*(;.*)

看它运行here。
测试向量（与上面的颜色编码图相同）：

Peekaboo ; A comment starts with a semicolon and continues till the EOL
Unless the semicolon is surrounded by dquotes ”Don’t do it ; here” ;but match me; once
Im not surrounded ”so pay attention to me” ; ”peekaboo”
Im not surrounded ”so pay attention” to;me” ; ”peekaboo”
Im not surrounded ”so pay attention to me ; peekaboo
Dquote escapes a dquote so ”dont pay attention to ””me;here”” buster” do it ; here
Don’t pay attention to  ”””me;here””” but do ””it;here””
and ”dont do ””it;here”””  either ;peekaboo
but "pay attention to "it;here"" ;not here though
Simon said ”I like goats” then he added ”and sheep;” ;a good comment is ”here
Simon said ”I like goats” then he added ”and sheep;” dont do it here
Simon said ””I like goats;”peekaboo
Simon said ”I like goats;””peekaboo

regex

来源：https://stackoverflow.com/questions/74636553/match-comments-unless-the-initiating-character-is-surrounded-by-unescaped-quotes

1条答案

按热度按时间

2wnc66cl1#

任务是找到 * 以;分号 * 开头的注解，考虑到"" * 转义引号 * 和之前可能的非右引号。这种方法适用于尚未提供的测试用例。

更新的模式：更短、更有效的变体 *，不带替换 *。

^((?>(?:(?:[^"\n;]*"[^"\n]*")+(?!"))?[^"\n;]*)"?[^"\n;]*);.*

New demo at regex101
此模式 * 无需交替 *，并使用 * 负 * lookahead来检查最后一个有效双引号。在这两种模式中，atomic group * mimics * possessive quantifiers可防止任何 * 回溯 * 并保持平衡。使用 * 所有格量词 *，模式看起来像this regex101 demo。[^";\n]*"?[^";\n]*是允许可选非右引号的部分。

上一个模式：结果证明这是可靠的，但速度稍慢。

^((?>(?:(?:[^;"\n]*"(?>(?:[^"\n]+|"")*)")+)?)[^";\n]*"?[^";\n]*);.*

Old demo at regex101
"(([^"]+|"")*)"使用"... "或""。这会重复任意次数，其中包含任何不是;或"的[^;"]*字符。所有这些操作都在 atomic group 内完成。由于使用了 atomic group，因此引号部分之间有任何 * 非分号 *。在最终允许一个 optional 非封闭"之后，要么找到一个;，要么失败。

第一个 * capturing group$1包含直到目标;comment-start 的部分。要删除注解，请用 * 捕获的 * 部分替换完全匹配。如果需要，将(.*)捕获到 * 第二个组 *。

赞(0）回复(0）举报 2022-12-05

我来回答

regex 匹配注解，除非起始字符用未转义的引号括起来

1条答案

相关问题

热门标签

最新问答