regex 注解正则表达式

9rnv2umw 于 2023-03-09 发布在其他

关注(0)|答案(7)|浏览(145)

我正在尝试用JavaScript注解正则表达式。
似乎有很多关于如何使用regex从代码中 * 删除 * 注解的资源，但实际上没有关于如何在JavaScript中 * 注解 * 正则表达式以便它们更容易理解的资源。

regex

来源：https://stackoverflow.com/questions/15463257/commenting-regular-expressions

7条答案

按热度按时间

3pmvbmvn1#

不幸的是，JavaScript不像其他语言那样对正则表达式常量有一个详细的模式。
如果不使用任何外部库，最好的办法就是使用一个普通字符串并注解：

var r = new RegExp(
    '('      + //start capture
    '[0-9]+' + // match digit
    ')'        //end capture
); 
r.test('9'); //true

赞(0）回复(0）举报 2023-03-09

wydwbb8l2#

虽然Javascript本身并不支持多行和带注解的正则表达式，但构造一些东西来完成同样的事情是很容易的--使用一个函数，它接受一个（多行，带注解的）字符串，并从该字符串返回一个正则表达式，没有注解和换行符。
下面的代码片段模仿了其他风格的x（“extended“）标志的行为，它忽略模式中的所有空白字符以及注解，注解用#表示：

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove everything between the first unescaped `#` and the end of a line
  // and then remove all unescaped whitespace
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\])#.*/g, '$1')
    .replace(/(^|[^\\])\s+/g, '$1');
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^       # match the beginning of the line
  (\w+)   # 1st capture group: match one or more word characters
  \s      # match a whitespace character
  (\w+)   # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));

通常，要在Javascript字符串中表示反斜杠，必须对每个反斜杠进行双转义，例如str = 'abc\\def'。但是正则表达式经常使用许多反斜杠，双转义会使模式的可读性大大降低，因此在编写具有许多反斜杠的Javascript字符串时，使用String.raw模板常量是一个好主意。它允许单个类型化的反斜杠实际上表示一个文本反斜杠，而不需要额外的转义。
就像使用标准的x修饰符一样，要匹配字符串中的实际#，只需先将其转义，例如

foo\#bar     # comments go here

// this function is exactly the same as the one in the first snippet

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove everything between the first unescaped `#` and the end of a line
  // and then remove all unescaped whitespace
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\])#.*/g, '$1')
    .replace(/(^|[^\\])\s+/g, '$1');
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo#bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^       # match the beginning of the line
  (\w+)   # 1st capture group: match one or more word characters
  \#      # match a hash character
  (\w+)   # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));

请注意，要匹配文本空格字符（而不仅仅是 any whitespace字符），在任何环境（包括上述环境）中使用x标志时，必须先用\转义空格，例如：

^(\S+)\ (\S+)   # capture the first two words

如果你想频繁地匹配空格字符，这可能会变得有点乏味，使模式更难阅读，就像双转义反斜杠不是很理想一样。一个可能的（非标准的）修改是只去掉行首和行尾的空格，以及#注解之前的空格：

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove the first unescaped `#`, any preceeding unescaped spaces, and everything that follows
  // and then remove leading and trailing whitespace on each line, including linebreaks
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\]) *#.*/g, '$1')
    .replace(/^\s+|\s+$|\n/gm, '');
  console.log(cleanedPatternStr);
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^             # match the beginning of the line
  (\w+) (\w+)   # capture the first two words
`);
console.log(input.replace(pattern, '$2 $1'));

赞(0）回复(0）举报 2023-03-09

bmvo0sr53#

在其他几种语言中（特别是Perl），有一个特殊的x标志。当设置时，regexp忽略其中的任何空格和注解。遗憾的是，javascript regexp不支持x标志。
由于缺乏语法，提高可读性的唯一方法是约定。我的方法是在复杂的正则表达式之前添加一个注解，就像你有x标志一样包含它。例如：

/*
  \+?     #optional + sign
  (\d*)   #the integeric part
  (       #begin decimal portion
     \.
     \d+  #decimal part
  )
 */
var re = /\+?(\d*)(\.\d+)/;

对于更复杂的示例，您可以在这里和这里看到我使用该技术所做的工作。

赞(0）回复(0）举报 2023-03-09

jaxagkaj4#

在2021年，我们可以使用应用了String.raw()的template literals来实现这一点。

VerboseRegExp `
    (
        foo*                  // zero or more foos
        (?: bar | baz )       // bar or baz
        quux?                 // maybe a quux
    )
    
    \s \t \r \n \[ \] \\ \/ \`

    H e l l o                 // invisible whitespace is ignored ...
    [ ]                       // ... unless you put it in a character class
    W o r l d !

    $ {}                      // Separate with whitespace to avoid interpolation!
`
`gimy`                        // flags go here

/*
returns the RegExp
/(foo*(?:bar|baz)quux?)\s\t\r\n\[\]\\\/\`Hello[ ]World!${}/gimy
*/

VerboseRegExp的实现：

const VerboseRegExp = (function init_once () {
    const cleanupregexp = /(?<!\\)[\[\]]|\s+|\/\/[^\r\n]*(?:\r?\n|$)/g
    return function first_parameter (pattern) {
        return function second_parameter (flags) {
            flags = flags.raw[0].trim()
            let in_characterclass = false
            const compressed = pattern.raw[0].replace(
                cleanupregexp,
                function on_each_match (match) {
                    switch (match) {
                        case '[': in_characterclass = true; return match
                        case ']': in_characterclass = false; return match
                        default: return in_characterclass ? match : ''
                    }
                }
            )
            return flags ? new RegExp(compressed, flags) : new RegExp(compressed)
        }
    }
})()

有关.raw[0]的功能，请参见Verbose Regular Expressions in JavaScript。
请注意，与regex文字不同，Javascript解析器不会缓存它，因此如果重用它，请将生成的regexp保存在变量中。

赞(0）回复(0）举报 2023-03-09

juzqafwq5#

我建议您在带有正则表达式的行上方放置一个正则注解，以便对其进行解释。
你将拥有更多的自由。

赞(0）回复(0）举报 2023-03-09

vuv7lop36#

您可以使用verbose-regexp包。

import { rx } from 'verbose-regexp'

const dateTime = rx`
 (\d{4}) // year
 -       // separator
 (\d{2}) // month
`

// returns RegExp /(\d{4})-(\d{2})/

赞(0）回复(0）举报 2023-03-09

a9wyjsp77#

Perl的/x标志（允许空格和#注解）是一个Javascript语言建议，但停留在该过程的第1阶段（共4个阶段）。
修改器提案，例如现在处于第3阶段的/(?i:ignore case)normal/，已将x标志从其中删除。

赞(0）回复(0）举报 2023-03-09