如何用VBA和RegExp返回两个日期字符串之间的文本和CRLF

mnowg1ta  于 2023-05-08  发布在  其他
关注(0)|答案(1)|浏览(112)

我有一个Word宏,其目的是返回Word文件中开始一行的前两个日期字符串之间的内容。
例如,假设Word文档中有以下文本:

5/1/2023
blah
blah
blah

4/1/2023
jabber
jabber
jabber

Word宏应该返回:

blah
blah
blah

但是,它会返回:

blahblahblah

CRLF被移除,这是不期望的。
下面是我在活动文档上运行的Word VBA代码:

Function getLastNote()
'extracts content between two date strings. The result should be the latest note entry
Dim regEx As Object, matchCollection As Object, extractedString As String

Set regEx = CreateObject("VBScript.RegExp")

With regEx
  .IgnoreCase = True
  .Global = False    ' Only look for 1 match; False is actually the default.
  .Pattern = "\d{1,2}\/\d{1,2}\/\d{4}((.|\n)*?)\d{1,2}\/\d{1,2}\/\d\d\d\d"
End With

Set matchCollection = regEx.Execute(ActiveDocument.Content.text)
If matchCollection.Count > 0 Then
    ' Extract the first submatch's (capture group's) value -
    getLastNote = matchCollection(0).submatches(0)
Else
    getLastNote = ActiveDocument.Content 'there's no match so return the text of the doc.
End If
End Function

我如何保留CRLF?谢谢!

3duebb1j

3duebb1j1#

我认为问题在于文本中换行符的编码方式。我本以为它会是Windows的换行符,比如\r\n,所以我在模式中使用了\r?\n来兼容Linux/Windows。但似乎ActiveDocument.Content.Text只包含\r,这是相当令人惊讶的!
但无论如何,我们可以用[\s\S]*?替换.*?,以匹配任何内容,包括换行符。这是因为VBA正则表达式引擎没有s修饰符来使.也匹配换行符。
我还更改了您的模式,以匹配下一个日期或文档结尾,以防您只有一个注解。
我还在一个组中捕获了日期,这样我们就可以返回一个同时包含日期和文本值的对象。

VBA代码

clsNote

创建一个名为 clsNote 的新类模块,并将示例化选项设置为2 - PublicNotCreatable,以便您的函数可以创建它的示例。
这个类的代码很简单,但将来可以改进:

Option Explicit

Public strDate As String 'It would be better with a DateTime object.
Public strText As String

Public Sub DisplayMsgBox()
    MsgBox strText, vbInformation, strDate
End Sub

获取最新留言的函数

'Add ref to "Microsoft VBScript Regular Expressions 5.5" in Tools -> References.

Option Explicit

Function getLastNote() As clsNote
'Returns the last note from the current document.
'The result should be the latest note entry, at the top of the document.

    Dim regEx As RegExp, matchCollection As Object, oNote As clsNote
    Set regEx = New RegExp
    Set oNote = New clsNote
    
    With regEx
      .IgnoreCase = True
      .MultiLine = False
      'Only look for 1 match; False is actually the default.
      .Global = False
      'First group is the date and second group is the content between the next date or document end.
      .Pattern = "(\d{1,2}\/\d{1,2}\/\d{4})[\r\n]+([\s\S]*?)[\r\n]+(?:\d{1,2}\/\d{1,2}\/\d{4}|$)"
    End With
    
    Set matchCollection = regEx.Execute(ActiveDocument.Content.Text)
    
    If matchCollection.Count > 0 Then
        'Submatches:
        '- index 0: date.
        '- index 1: text.
        oNote.strDate = matchCollection(0).submatches(0)
        oNote.strText = matchCollection(0).submatches(1)
    Else
        'There's no match so return the text of the doc.
        oNote.strDate = ""
        oNote.strText = ActiveDocument.Content.Text
    End If
    
    Set getLastNote = oNote

End Function

执行控制台测试

假设getLastNote()函数在ThisDocument上,只需运行:

ThisDocument.getLastNote.DisplayMsgBox

相关问题