regex 从带条件的字符串中提取一个数字

dauxcl2d  于 2023-04-13  发布在  其他
关注(0)|答案(4)|浏览(205)

下面代码从字符串中提取所有数字,甚至合并它们。
但我只需要提取一个带条件的数:
1-数字为一个两个字符。
2-如果数字后面是"inchin,则提取它并忽略字符串中的其余数字。
3-如果没有找到上述条件(2),则提取字符串中的第一个数字并忽略其余数字。
| 当前字符串|预期结果|
| --------------|--------------|
| 检查-8 ''水管12|八|
| INSPECT- 18水12”|十二岁|
| PM-6 in管道,来自M37 st|六|
| PM- 6英寸管道,来自H44|六|
| PM-36管道自M-1 T|三十六|
| 来自MT的PM-123管道||

Public Function GetNumeric(CellRef As String)
    Dim StringLength As Long, i As Long, Result As Variant
    StringLength = Len(CellRef)
    For i = 1 To StringLength
      If IsNumeric(Mid(CellRef, i, 1)) Then
         Result = Result & Mid(CellRef, i, 1)
      End If
    Next i
    GetNumeric = Result
End Function
sxpgvts3

sxpgvts31#

也许可以使用正则表达式创建自己的UDF。也许类似于:

Public Function RegexExtract(str, pat, Optional gFlag As Boolean = False, Optional pos As Integer = 0, Optional cse as Boolean = True) As String

Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")

RE.Pattern = pat
RE.Global = gFlag
RE.IgnoreCase = cse

If RE.Test(str) Then
    RegexExtract = RE.Execute(str)(pos)
Else
    RegexExtract = vbNullString
End If

End Function

注意,我创建了一个可选的全局标志,默认为false,它应该只拉取单元格中的第一个命中。可选的pos变量用于返回特定的匹配,以防您希望在将全局标志设置为true时以某种方式返回其他数字。还请注意,使用设置为true的case标志,默认情况下匹配不区分大小写。
你可以这样调用上面的代码:

=IFERROR(--RegexExtract(A1,"\b\d\d?(?!\d)(?=\s*(?:""|''|in(?:ch)?\b)?)"),"")

所用模式代表:

  • \b\d\d?-1位数字和第二位可选数字的字边界;
  • (?!\d)-负先行,不再Assert数字;
  • (?=\s*(?:"|''|in(?:ch)?\b)?)-Assert位置的正向前看后面是0+(贪婪)空格字符和:
  • "-双引号,或;
  • ''-两个单引号,或;
  • in(?:ch)?\b-字面上的'in'后跟可选的'ch'和一个单词边界,以确认字母不是较大子串的一部分,以防止误报。
    EDIT1

根据执行主任的以下评论;由于OP允许没有英寸的数字也被匹配,因此这里的添加是为了包括负前瞻,其将Assert不存在有效图案的第二次出现:

\b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))(?=\s*(?:""|''|in(?:ch)?\b)?)

我想这是隐含的相同:

\b\d\d?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))

EDIT2

要允许小数,您可以包含一个可选的非捕获组:

=IFERROR(--RegexExtract(A2,"\b\d\d?(?:\.\d+)?(?!\d|.*\b\d+\s*(?:""|''|in(?:ch)?\b))"),"")
wlzqhblo

wlzqhblo2#

这里有两个不需要正则表达式的解决方案。
第一种解决方案是对字符串应用一系列变换,这样我们就可以使用Split来获得一个字符串序列,其中一些字符串是数字。这里的问题是选择正确的变换,以便Split可以应用于隔离数字。有时这可能是不可能的。
第二种解决方案只是解析字符串,直到它提取出一个数字字符序列,然后返回该数字字符串进行进一步处理。这可能是您的情况下的最佳解决方案。
请注意,这两种解决方案都没有针对边缘情况进行测试。
假设您试图解析看起来是自由格式的文本,那么可能会有很多边缘情况。

Sub Test()

    Dim myC As Collection
    Set myC = New Collection
    With myC
    
        .Add "INSPECT - 8" & Chr$(34) & " Water 12 Pipe    8"
        .Add "INSPECT- 18" & Chr$(34) & " Water 12   18"
        .Add "PM-6in Pipe From M37 st 6"
        .Add "PM- 6 inch Pipe From H44    6"
        .Add "PM-36 Pipe From M-1T    36"
        .Add "PM-123 Pipe From MT"
        
    End With
    

    Dim myItem As Variant
    Dim myNumber As Long
    For Each myItem In myC
    
    'Option 1
'        If TryGetFirstNumber(myItem, myNumber) Then
'            Debug.Print myNumber
'        End If
     ' option 2
'        Debug.Print ParseFirstNumber(VBA.CStr(myItem))
    Next
    
End Sub

' Pass ByVal so we don't alter the original string
Public Function ApplyTransforms(ByVal ipString As String) As String

    ipString = VBA.LCase(ipString)
    ipString = VBA.Replace(ipString, "-", " ")
    ipString = VBA.Replace(ipString, VBA.Chr(34), " ")
    ipString = VBA.Replace(ipString, "in ", " ")
    ipString = VBA.Replace(ipString, "inch ", " ")
    
    ApplyTransforms = ipString
    
End Function

'The try function indicates success by the returned boolean value, the result of the success is returned Byref in parameter opNumber
Public Function TryGetFirstNumber(ByRef ipString As Variant, ByRef opNumber As Long, Optional ipLength As Long = 2) As Boolean

    
    Dim myArray As Variant
    myArray = VBA.Split(ApplyTransforms(ipString))
    
    Dim myItem As Variant
    For Each myItem In myArray
    
        If VBA.IsNumeric(myItem) Then
        
            If VBA.Len(myItem) <= ipLength Then
            
                opNumber = VBA.CLng(myItem)
                TryGetFirstNumber = True
                Exit Function
                
            End If
            
        End If
            
    Next
    
    TryGetFirstNumber = False
    
End Function

Public Function ParseFirstNumber(ByRef ipString As String) As String

    Dim myIndex As Long
    myIndex = 1
    
    Dim myLen As Long
    myLen = VBA.Len(ipString)
    
    Dim myNumber As String
    myNumber = vbNullString

    Do While myIndex <= myLen
        If VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0 Then
            Exit Do
        End If
        myIndex = myIndex + 1
    Loop

    If myIndex > myLen Then
        ParseFirstNumber = myNumber
        Exit Function
    End If

    Do While VBA.InStr("0123456789", VBA.Mid$(ipString, myIndex, 1)) > 0
        myNumber = myNumber & VBA.Mid$(ipString, myIndex, 1)
        myIndex = myIndex + 1
        If myIndex > myLen Then
            ParseFirstNumber = myNumber
            Exit Function
        End If
    Loop
    
    ParseFirstNumber = myNumber
    
End Function
ctrmrzij

ctrmrzij3#

我认为答案已经在你提出的要求中了。你需要添加if语句作为检查来做这个或case语句,我建议在添加到结果之前定义一个额外的变量来检查。将this定义为你正在循环的字符串的字符。示例:

Dim Check as string

Check = Mid(CellRef, i, 1)
If Check =  Chr(34) then GetNumeric = Result

处理多个数字而没有进一步的定义将产生无法解决的情况,例如,倒数第二个例子没有指示1或36是否正确。

v64noz0r

v64noz0r4#

请,(也)测试下一种方式.它使用标准VBA和一个数组进行处理.处理后的数组内容将在代码结束时被丢弃,所以,即使对于大范围,它也应该非常快.它假设要处理的范围从“A1”开始,标题存在于工作表的第一行:

Sub extractInchesNoFromAllRange()
   Dim sh As Worksheet, lastR As Long, arr, i As Long
   Dim dblQ As Long, sQ As Long, strIn As Long, No As String
   
   Set sh = ActiveSheet
   lastR = sh.Range("A" & sh.rows.count).End(xlUp).row
   
   arr = sh.Range("A2:B" & lastR).Value2
   
   For i = 1 To UBound(arr)
        dblQ = InStr(arr(i, 1), """") 'check if a double quote caracters exists and return its postition if it does
        sQ = InStr(arr(i, 1), "''")      'check if two simple quote caracters exists and return its postition if it does
        strIn = InStr(arr(i, 1), "in") 'the same as above for "in" string
        
        No = "" 'reinitialize the variable to keep the extracted number (as string...)
        If dblQ > 0 Or sQ > 0 Then 'if doble quote exists:
            If IsNumeric(Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1)) Then 'if a number exists before the quote ch
                No = Mid(arr(i, 1), IIf(dblQ > 0, dblQ, sQ) - 1, 1)                      'extract first digit
                arr(i, 2) = extractNo(IIf(dblQ > 0, dblQ, sQ) - 2, CStr(arr(i, 1)), No, True) 'call the function which make extraction by (backward) iteration
            End If
        ElseIf strIn > 0 Then 'if "in" exists:
            If Mid(arr(i, 1), strIn + 2, 1) = " " Or Mid(arr(i, 1), strIn + 2, 2) = "ch" Or strIn + 1 = Len(arr(i, 1)) Then
                If Mid(arr(i, 1), strIn - 1, 1) = " " Then
                    arr(i, 2) = extractNo(strIn - 2, CStr(arr(i, 1)), No, True)
                Else
                    arr(i, 2) = extractNo(strIn - 1, CStr(arr(i, 1)), No, True)
                End If
            End If
        Else
            arr(i, 2) = extractNo(0, CStr(arr(i, 1)), "")
        End If
   Next i
   
   'drop the processed arran content back in its range:
   sh.Range("A2").Resize(UBound(arr), UBound(arr, 2)).Value2 = arr
End Sub

Function extractNo(pos As Long, str As String, No As String, Optional boolChar = False) As Variant
   Dim i As Long, boolNo As Boolean
   
   On Error GoTo WrongPatt
   If boolChar Then 'if one of the searched characters has been found:
        For i = pos To 1 Step -1
            If IsNumeric(Mid(str, i, 1)) Or Mid(str, i, 1) = "." Then
                No = CStr(Mid(str, i, 1)) & No
            Else
                extractNo = CDbl(No): Exit For
            End If
        Next i
  Else 'if no searched string has been found:
        For i = 1 To Len(str)
            If IsNumeric(Mid(str, i, 1)) Then
                boolNo = True
                No = No & Mid(str, i, 1)
            Else
                If boolNo Then Exit For
            End If
        Next i
        If Len(No) <= 2 And No <> "" Then
                extractNo = CLng(No)
        Else
                extractNo = ""
        End If
  End If
  Exit Function
  
WrongPatt:
   extractNo = "Wrong pttern"
End Function

但是上面的代码只会处理你在问题中显示的模式字符串。例如,如果有更多的双引号字符,其目的不同在前面有一个数字的字符之前),代码将只处理第一个找到的字符串。如果搜索的字符串是字符串中的第一个**等等,可能会有问题…它可以适应更多的情况,但我们这里不是心灵读者涵盖这样的未显示的情况...

相关问题