regex 如何在Excel VBA正则表达式中处理负向后查找?

nzrxty8p  于 2023-05-08  发布在  其他
关注(0)|答案(1)|浏览(431)

Excel VBA代码使用正则表达式从HTML文件中提取节号。但是,正则表达式包含VBA正则表达式中不支持的负向后查找。"(?<!tbl"")>(\d(\.\d)+)<"

Sub GetAllSectionNumbers()
    LRb = Cells(Rows.Count, "B").End(xlUp).Row
    Range("B7:C" & LRb).ClearContents
    Dim fileDialog As fileDialog
    Set fileDialog = Application.fileDialog(msoFileDialogOpen)
    
    fileDialog.AllowMultiSelect = True
    fileDialog.Title = "Select HTML files"
    fileDialog.Filters.Clear
    fileDialog.Filters.Add "HTML files", "*.htm;*.html", 1
    
    If fileDialog.Show <> -1 Then Exit Sub
    
    Dim file As Variant
    For Each file In fileDialog.SelectedItems
        Dim fileContents As String
        Open file For Input As #1
        fileContents = Input$(LOF(1), 1)
        Close #1
        
        Dim regex As Object
        Set regex = CreateObject("VBScript.RegExp")
        regex.Pattern = "(?<!tbl"")>(\d(\.\d)+)<"
        regex.Global = True
        regex.IgnoreCase = True
        regex.MultiLine = True
        TRET = regex.Pattern
        filePath = file
        fileFolder = Left(filePath, InStrRev(filePath, "\"))
        fileNameSource = Mid(filePath, InStrRev(filePath, "\") + 1, 100)
    
        Dim match As Object
        Set match = regex.Execute(fileContents)
        
        Dim i As Long
        For i = 0 To match.Count - 1
            LRb = Cells(Rows.Count, "B").End(xlUp).Row + 1
    
            Range("B" & LRb).Value = match.Item(i).SubMatches(0)
            Range("C" & LRb).Value = fileNameSource
        Next i
    Next file
    MsgBox "Done!"
End Sub

有没有其他的正则表达式解决方案来处理这个问题?

b91juud3

b91juud31#

当你提取时,传统的方法是使用“最好的正则表达式技巧”,即匹配你不需要的,匹配 * 并捕获 * 你需要的。
在这种特定情况下的正则表达式如下所示

tbl">\d(?:\.\d)+<|>(\d(?:\.\d)+)<

在代码中,它看起来像

regex.Pattern = "tbl"">\d(?:\.\d)+<|>(\d(?:\.\d)+)<"

接下来,在你的代码中,你应该检查match.SubMatches(0)值是否真的存在,如果是的话,接受它,因为它是你需要的。
参见regex demo

相关问题