regex 修改正则表达式以匹配具有多个完整月份名称并以连字符结尾的字符串

b09cbbtk  于 2023-01-21  发布在  其他
关注(0)|答案(2)|浏览(136)

我使用正则表达式从一系列字符串中提取日期。格式略有不同,但总是包含完整的月份。字符串通常包含两个日期来表示如下范围:

February 1, 2020 - March 18, 2020

February 1st 2020 - March 18th 2020

这是工作很好,直到我遇到这样的日期:

June 1 - July 22, 2018

其中一年没有列报在范围的“起始”部分,因为它与“结束”年相同。
下面是我粗略地复制并应用到我的代码中的正则表达式。它是Javascript,但我真的认为这更像是一个正则表达式问题...

const regex = /((\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?)(\d{1,2}(st|nd|rd|th)?)?((\s*[,.\-\/]\s*)\D?)?\s*((19[0-9]\d|20\d{2})|\d{2})*/gm;

var myDateString1 = "January 8, 2020 - January 27, 2020"; // THIS WORKS GREAT!
var myDateString2 = "January 8 - January 27, 2020"; // THIS DOES NOT WORK GREAT!

var dates = myDateString1.match(regex);
// returns ["January 8, 2020","January 27, 2020"]

var dates2 = myDateString2.match(regex);
// returns ["January 8 - J"]

有没有什么方法可以修改它,这样如果遇到连字符,它就会中断给定的匹配?这样myDateString2就会返回["January 8", "January 27, 2020"]
字符串的前后有时会有单词,例如

Presented from January 8, 2020 - January 27, 2020 at such and such place

所以我不认为简单地使用一个基于连字符before/after的正则表达式是可行的。

wqsoz72f

wqsoz72f1#

您可以使用2个捕获组,并使模式更具体以匹配字符串的格式。
可以省略/m标志,因为模式中没有锚。
请注意,该模式匹配类似于日期的模式,并且不验证日期本身。

\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?(?:,\s+\d{4})?)\s+[,./-]\s+\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?,\s+\d{4})\b

参见regex101 demo

const regex = /\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?(?:,\s+\d{4})?)\s+[,./-]\s+\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?,\s+\d{4})\b/g;
const str = `January 8, 2020 - January 27, 2020
January 8 - January 27, 2020
Presented from January 8, 2020 - January 27, 2020 at such and such place
June 1 - July 22, 2018`;

console.log(Array.from(str.matchAll(regex), m => [m[1], m[2]]))
yzckvree

yzckvree2#

注意-原来的正则表达式,试图成为所有形式的匹配,这是不可能的,像这样。我改革了它做75%的原意。但傻瓜黄金等,在最后...
捕获组用于调试。
只需去掉类中的连字符,并在末尾使用单个?使年份可选,就可以得到您想要的结果。

/((\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?)(\d{1,2}(st|nd|rd|th)?)?(((\s*[,./]))?\s+(19[0-9]\d|20\d{2})|\d{2})?/

https://regex101.com/r/6NiNxy/1
用集群(?: )替换捕获组,然后再给它一个分解级别,这样会更快。

/(?:\b\d{1,2}\D{0,3})?\b(?:J(?:an(?:uary)?|u(?:ne?|ly?))|Feb(?:ruary)?|Ma(?:r(?:ch)?|y)|A(?:pr(?:il)?|ug(?:ust)?)|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}(?:st|[nr]d|th)?)?(?:(?:\s*[,./])?\s+(?:19[0-9]\d|20\d{2})|\d{2})?/

https://regex101.com/r/NTR0WD/1

const regex = /(?:\b\d{1,2}\D{0,3})?\b(?:J(?:an(?:uary)?|u(?:ne?|ly?))|Feb(?:ruary)?|Ma(?:r(?:ch)?|y)|A(?:pr(?:il)?|ug(?:ust)?)|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}(?:st|[nr]d|th)?)?(?:(?:\s*[,./])?\s+(?:19[0-9]\d|20\d{2})|\d{2})?/g;
var myDateString1 = "January 8, 2020 - January 27, 2020";
var myDateString2 = "January 8 - January 27, 2020";

console.log(myDateString1.match(regex));
console.log(myDateString2.match(regex));

相关问题