regex 从字符串中检索漫画的发行编号

gwbalxhn  于 2023-06-07  发布在  其他
关注(0)|答案(1)|浏览(132)

我不太会用RegEx。我需要从漫画书的长标题中检索问题编号,该标题包括其名称,有时还包括艺术家的姓名。通常,问题编号是字符串中的最后一个数字,但不总是如此。下面是6个例子,它们反映了我所看到的变化范围:

STAR WARS: DOCTOR APHRA 32 CHRIS SPROUSE RETURN OF THE JEDI 40TH ANNIVERSARY VARIANT
DEADPOOL 7
X-23: DEADLY REGENESIS 3 GERALD PAREL VARIANT
SPIDER-MAN 2099: DARK GENESIS 5
THE GODFORSAKEN 99 OF KRONOS 2 KEN GRAGGINS VARIANT
Teenage Mutant Ninja Turtles: Saturday Morning Adventures (2023-) #1 Variant RI (10) (Dooney)

我正在使用VBA,这是我当前的函数:

Function ExtractText(c As Range) As String
    Dim rgx As RegExp
    Dim match As match
    Dim mc As MatchCollection
    Dim sComicNo As String, sPattern As Variant
    Dim lPos As Long, x As Long

    sComicNo = ""
    sPattern = Array(" [0-9] ", " #[0-9] ", " [0-9][0-9] ", " #[0-9][0-9] ", " [0-9][0-9]", " #[0-9][0-9]", " [0-9]", " #[0-9]")
    lPos = 0
    
    Set rgx = New RegExp
    
    On Error GoTo ErrHandler
    
    Do While sComicNo = ""
        
        With rgx
            .Pattern = sPattern(x)
            .Global = True
        
            If .Test(c.Value) Then
                Set mc = .Execute(c.Value)
                
                If mc.Count > 0 Then
                    Set match = mc.Item(mc.Count - 1)
                Else
                    ExtractText = ""
                End If
                
                lPos = match.FirstIndex
                sComicNo = WorksheetFunction.Trim(match.Value) & "|" & lPos
            Else
                sComicNo = ""
            End If
            
            x = x + 1
            
            If x > 8 Then
                ExtractText = sComicNo
                Exit Function
            End If
        
        End With
        
    Loop
    
    ExtractText = sComicNo
    
ErrHandler:

    Exit Function
    
End Function

除了蜘蛛侠2099之外,这个模式与我所有的例子都匹配,但是我忽略了其他可能的变化。它还检索匹配的位置以用于单独的目的。我试图尽可能地限制使用模式的顺序,将检索非常具体的情况下,并逐步从那里工作。

wgeznvg7

wgeznvg71#

我不知道VBA,但通过尝试理解代码,我认为您的正则表达式可以简化为:

\s      # Match a whitespace,
#?      # an optional '#', then
\d\d?   # 1 or 2 digits, followed by
\b      # a word boundary (prevents numbers with 3+ digits from being matched).

试试on regex101.com
或者,您可以使用向后查找来跳过修剪部分:

(?<=\s)#?\d\d?\b

...其中(?<=\s)表示“* 匹配前面有空格 * 的内容”。
试试on regex101.com

相关问题