regex 将预定义的URL文本(以相同的前缀开头)提取到Array中

qnyhuwrf  于 2023-06-07  发布在  其他
关注(0)|答案(1)|浏览(112)

我有文件URL文本像这些例子:
http://xxxxx.pdfhttp://xxxxxxxxxxx.dochttp://xxxxxxxxxxxxx.xls
每个URL之间的分隔符不是空格,它可以由vbTabvbLf或其他分隔符分隔。
但在所有情况下,URL都以相同的前缀“http:”开始,并以点+三个字符结束。
我需要将这些预定义的URL提取到一个Array中,以便稍后对其进行操作。
我做了下面的代码作为一个变通方案,通过使用两个数组,我还必须从第一个数组的第二个元素开始循环,因为我发现第二个数组上有额外的“http”。
我的问题是,有没有其他更简洁的代码?

Sub Split_URLs_to_Array()

  Dim wDoc As Word.Document, rngSel As String
  Dim arrS, arrD, i As Long
  
  Set wDoc = Application.ActiveInspector.WordEditor
  
  rngSel = Trim(wDoc.Windows(1).Selection.Text)

  arrS = Split(rngSel, "http")
  
  ReDim arrD(0 To UBound(arrS))
  
   For i = 1 To UBound(arrS)
       arrD(i) = "http" & arrS(i)
     Debug.Print arrD(i)
   Next

End Sub
jecbmhm3

jecbmhm31#

1.你(昨天)问了一个从字符串中提取URL的方法,没有任何分隔符...下一个函数将执行此操作:

Function SplitByStartOfString(strTxt As String, strDelim As String) As Variant
  Dim arr: arr = Split(strTxt, strDelim)
  
  arr(0) = "@#$%^": arr = filter(arr, "@#$%^", False) 'eliminate the first empty element
  SplitByStartOfString = Split(strDelim & Join(arr, "|" & strDelim), "|")
End Function

它可以用下面的方式进行测试:

Sub testSplitByStartOfString()
   Dim x As String: x = "https://myurl1/x.pdfhttps://myurl2/y.xlsxhttps://myurl3/z.docx"
   Dim arr
   arr = SplitByStartOfString(x, "https:")
   Debug.Print Join(arr, "||") 'just to visually see the array result.
End Sub

当然,你可以使用你所知道的字符串,分隔符将是他们每个人的共同前缀。
并且存在行尾分隔符或者VbTab,上面的代码也可以工作,但是这些分隔符会被包含在字符串的末尾。其中,对于Outlook中的URL将不计数。它们将仅使用相应的分隔符进行排列。
1.1使用FilterXML的另一个函数可能是下一个:

Function splitXMLByStartOfString(strText As String, strDelim As String) As Variant
    Dim XML As String: XML = "<t><s>" & VBA.Replace(strText, strDelim, "</s><s>" & strDelim) & "</s></t>"
    splitXMLByStartOfString = Application.FilterXML(XML, "//s[position()>1]")  'nodes starting from the second one...
    'splitXMLByStartOfString = Application.FilterXML(XML, "//s[count(node())>0]")   'another working way (all not empty nodes)
    'splitXMLByStartOfString = Application.FilterXML(XML, "//s[starts-with(., '" & strDelim & "')]") 'working way, too (nodes starting with strDelim)
End Function

可以使用下一个子程序进行测试:

Sub TestFilterXMLHttp()
   Dim x As String: x = "https://myurl1/x.pdfhttps://myurl2/y.xlsxhttps://myurl3/z.docx"
   Dim arr: arr = splitXMLByStartOfString(x, "https:") 'It returns a 2D, 1 column array...
  
   Debug.Print Join(Application.Transpose(arr), "||")
End Sub

上面的函数工作得又好又快,但在Excel中。问题没有提到它应该在Outlook VBA中使用(但我知道这一点...)。因此,下一个解决方案使用Outlook的自动化,并以这种方式使用Excel.Application。此版本需要打开Excel会话,但如果没有打开任何会话,则可以轻松地进行调整以打开新会话:

Function splitXMLByStartOfString(strText As String, strDelim As String, objEx As Object) As Variant
   
    Dim XML As String: XML = "<t><s>" & VBA.Replace(strText, strDelim, "</s><s>" & strDelim) & "</s></t>"
    splitXMLByStartOfString = objEx.FilterXML(XML, "//s[position()>1]")  'nodes starting from the second one...
    'splitXMLByStartOfString = Application.FilterXML(XML, "//s[count(node())>0]")   'another working way (all not empty nodes)
    'splitXMLByStartOfString = Application.FilterXML(XML, "//s[starts-with(., '" & strDelim & "')]") 'working way, too (nodes starting with strDelim)
End Function

和子测试它,使用上述自动化:

Sub TestFilterXMLHttp()
   Dim objEx As Object: Set objEx = GetObject(, "Excel.application")
   Dim x As String: x = "https://myurl1/x.pdfhttps://myurl2/y.xlsxhttps://myurl3/z.docx"
   Dim arr: arr = splitXMLByStartOfString(x, "https:", objEx) 'It returns a 2D, 1 column array...
  
   Debug.Print Join(objEx.Transpose(arr), "||")
End Sub

1.从字符串元素中提取一个数组可以使用下一个函数提取:

Function extractFromStringSep(strText As String) As Variant
   Dim arrC: arrC = Array(vbTab, vbLf) 'you can extend the supposed separators...
   Dim El
   
   For Each El In arrC
        If InStr(strText, El) > 0 Then
            extractFromStringSep = Split(strText, El): Exit Function
        End If
   Next El
End Function

当然,要提取的字符串不能包含假定的分隔符...
它可以像在下一个子测试:

Sub TestextractFromStringSep()
   Dim x As String: x = "https://myurl1/x.pdf" & vbTab & "https://myurl2/y.xlsx" & vbTab & "https://myurl3/z.docx"
          'x = "https://myurl1/x.pdf" & vbLf & "https://myurl2/y.xlsx" & vbLf & "https://myurl3/z.docx"
   Dim arr
   arr = extractFromStringSep(x)
   Debug.Print Join(arr, "||") 'just to visually see the array result.
End Sub

1.最后一个版本,也允许url字符串的部分作为分隔符(仅用于教学目的):

Function extractFromStrAndSep(strText As String, strDelim As String) As Variant
     Dim arrC: arrC = Array(vbTab, vbLf, "myur") 'you can extend the supposed separators...
     Dim El
     
     For Each El In arrC
        If InStr(strText, El & strDelim) > 0 Then
            extractFromStrAndSep = Split(strText, El & strDelim): Exit Function
        End If
   Next El
End Function

使用下一个(调整后的)接头进行测试:

Sub TestextractFromStrAndSep()
   Dim x As String: 'x = "https://myurl1/x.pdf" & vbTab & "https://myurl2/y.xlsx" & vbTab & "https://myurl3/z.docx"
          'x = "https://myurl1/x.pd" & vbLf & "https://myurl2/y.xlsx" & vbLf & "https://myurl3/z.docx"
          x = "https://myurl1/x.pd" & "myur" & "https://myurl2/y.xlsx" & "myur" & "https://myurl3/z.docx"
   Dim arr
   arr = extractFromStrAndSep(x, "https:")
   Debug.Print Join(arr, "||") 'just to visually see the array result.
End Sub

相关问题