regex 正则表达式从自由文本字符串中的任意位置提取年份

flseospp  于 2023-04-22  发布在  其他
关注(0)|答案(2)|浏览(126)

我需要写一个正则表达式,它将匹配字符串中任意位置的一行自由文本的第一年。
年份将是4位数,将开始20或21(如2030或2199)
它不应该匹配更长的数字,如20304050
下面是我用输出写的一些js代码,正如你所看到的,每个正则表达式只适用于某些情况,但不适用于所有情况。
注意-这个的最终版本不会是JS,所以我不想要需要额外代码的解决方案,只是一个纯正则表达式,尽管我可以接受额外的句号并将结果截断为4个字符。

const values = [
  '2025',
  '2150 is the year to match',
  'the year is 2030 see ref 2099662',
  'Should match the year here YEAR_2140 even though it has non numric chars preceeding it',
  'Should match the year at the end of a string like this - 2140',
  'ref 2099662 the year is 2140. And there is another sentence',
  'ref 2099662 the end of the string is the year 2140',
  'There is no year here 2055667'
]
console.log('   regx1', 'regx2,', 'regx3,', 'input string')

values.forEach((value, index) => {
  value = value.trim()
  const regex1 = /2[01]{1}[0-9]{2}/
  const regex2 = /2[01]{1}[0-9]{2}[^0-9]{1}/
  const regex3 = /2[01]{1}[0-9]{2}[^0-9]{1}/

  const year1 = (value.match(regex1) || [])[0] || '     '
  const year2 = (value.match(regex2) || [])[0] || '     '
  const year3 = (value.match(regex3) || [])[0] || '     '

  console.log(`${index + 1}) ${year1}, ${year2}, ${year3}, "${value}",`)
})

此代码输出:

regx1 regx2, regx3, input string
1) 2025,      ,      , "2025",
2) 2150, 2150 , 2150 , "2150 is the year to match",
3) 2030, 2030 , 2030 , "the year is 2030 see ref 2099662",
4) 2140, 2140 , 2140 , "Should match the year here YEAR_2140 even though it has non numric chars preceeding it",
5) 2140,      ,      , "Should match the year at the end of a string like this - 2140",
6) 2099, 2140., 2140., "ref 2099662 the year is 2140. And there is another sentence",
7) 2099,      ,      , "ref 2099662 the end of the string is the year 2140",
8) 2055,      ,      , "There is no year here 2055667",
gdx19jrr

gdx19jrr1#

(?:\b|\W)?(?<year>2(?:0|1)\d\d)\b

This pattern works for me

const values = [
  '2025',
  '2150 is the year to match',
  'the year is 2030 see ref 2099662',
  'Should match the year here YEAR_2140 even though it has non numric chars preceeding it',
  'Should match the year at the end of a string like this - 2140',
  'ref 2099662 the year is 2140. And there is another sentence',
  'ref 2099662 the end of the string is the year 2140',
  'There is no year here 2055667'
];

const reg = /(?:\b|\W)?(?<year>2[01]\d\d)\b/g;

for (const v of values) {

  //const matches = reg.exec( v );
  const matches = Array.from( v.matchAll( reg ) );
  
  const matchJson = JSON.stringify( matches );
  addRow( v, matchJson );
}

function addRow( x, y ) {
  const tbody = document.getElementById('rows');
  const tr    = tbody.insertRow(-1);

  const tdInput = tr.insertCell();
  const tdMatch = tr.insertCell();

  tdInput.textContent = x;
  tdMatch.textContent = y;
}
table {
  border: 1px outset #ccc;
}

th,
td {
  border: 1px inset #ccc;
  padding: 0.5em;
}
<table>
  <thead>
    <tr>
      <th>Input string</th>
      <th>Match</th>
    </tr>
  </thead>
  <tbody id="rows">
  </tbody>
</table>
68bkxrlz

68bkxrlz2#

我需要写一个正则表达式,它将匹配字符串中任意位置的一行自由文本的第一年。
下面的代码匹配每行中第一次出现的年份。我使用Java来演示它。

List<String> textList = List.of("2025",
         "2150 is the year to match",
         "the year is 2030 see ref 2099662",
         "Should match the year here YEAR_2140 even though it has non numric chars preceeding it",
         "Should match the year at the end of a string like this - 2140",
         "ref 2099662 the year is 2140. And there is another sentence",
         "ref 2099662 the end of the string is the year 2140",
         "There is no year here 2055667");
 
 
 Pattern pattern = Pattern.compile(regex);
 for (String text : textList) {
     Matcher m = pattern.matcher(text);
     if (m.find()) {
         System.out.printf("Matches %s : '%s'%n", m.group(1), text);
     } else {
         System.out.printf("No year in found in '%s'%n", text);
     }
 }

印刷品

Matches 2025 : '2025'
Matches 2150 : '2150 is the year to match'
Matches 2030 : 'the year is 2030 see ref 2099662'
Matches 2140 : 'Should match the year here YEAR_2140 even though it has non numr
ic chars preceeding it'
Matches 2140 : 'Should match the year at the end of a string like this - 2140'
Matches 2140 : 'ref 2099662 the year is 2140. And there is another sentence'
Matches 2140 : 'ref 2099662 the end of the string is the year 2140'
No year in found in 'There is no year here 2055667'

注意:其他正则表达式引擎可能要求您将\\替换为\

相关问题