javascript 正则表达式匹配以“/”结尾的子组

mum43rcc  于 2023-03-28  发布在  Java
关注(0)|答案(3)|浏览(127)

我有两个网址,需要捕捉域扩展后的字符串,如果它是一个两个字符的字符串,并以“/"结束.到目前为止,我已经得到了这个:

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";

var rgx = /\.([a-z]{0,3})\/([a-z]{2}\/)?/;


console.log(rgx.exec(t1));

console.log(rgx.exec(t2));

它会吐出来

[".net/", "net", undefined]
[".net/gb/", "net", "gb/"]

这是正确的,除了我不想捕捉“gb/”,而是“gb”代替。任何想法?我很卡住..

deyfvvtc

deyfvvtc1#

您可以使用的一种技术是在可选的非捕获组中使用捕获组:

/\.([a-z]{0,3})\/(?:([a-z]{2})\/)?/
                 ^^^^           ^^

参见regex demo

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1));
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2));

说到替代方法,这个正则表达式似乎更安全,因为它更精确:

/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/

参见this regex demo

详细信息

  • ^-字符串开始
  • https?:\/\/-协议部分(http://https://
  • [^\/]+\.([a-z]+)\/-域部分匹配一个或多个字符,而不是/,然后是.,然后将TLD(1个或多个字母,[a-z]+)捕获到组1中
  • (?:([a-z]{2})\/)?-可选序列:
  • ([a-z]{2})-组2捕获2个小写ASCII字母
  • \/-斜线。
var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1));
console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2));
qqrboqgw

qqrboqgw2#

另一种方法是从字符串中解析域扩展后的 first 元素:

function parse(str){
    // Remove the domain extension and everything before that.
    // Then return the first section of the rest, before `/`
    return str.replace(/.+\.\w+\//, '')
              .split('/')[0];
}
console.log(parse("http://www.test.net/shop/test-3"));
console.log(parse("http://www.test.net/gb/shop/test-2"));
console.log(parse("http://www.test.net/nl"));

这样,您就可以轻松地检查返回结果的长度。
正则表达式解释:

.+\.\w+\/
.+  - matches any character (except newline)
          Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\.  - matches the character . literally
\w+ - match any word character [a-zA-Z0-9_]
          Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\/  - matches the character / literally

这个正则表达式基本上获取域扩展之前的所有内容、域扩展本身以及域扩展之后的/

a6b3iqyw

a6b3iqyw3#

您可以简单地使用正斜杠作为前瞻,这不会像(?=\/)那样将其放入捕获组中
正如Evaldas Raisutis在注解中提到的,如果这两个字符是URL中的最后一个字符,并且没有尾随斜杠,则这两个字符将不匹配,因此可以使用(?=\/|$),它将匹配/ * 或 * 行尾,从而使该部分成为可选的。

\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?

See in Regex101

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
var t3 = "http://www.test.net/de/";
var t4 = "http://www.test.net/fr";

var rgx = /\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?/;

console.log(rgx.exec(t1));
console.log(rgx.exec(t2));
console.log(rgx.exec(t3));
console.log(rgx.exec(t4));

相关问题