regex 带超链接的正则表达式ActiveDocument,范围和格式

ryevplcw  于 2023-08-08  发布在  其他
关注(0)|答案(2)|浏览(87)

托拜厄斯的回答似乎是那张票。只是想补充一下,我刚刚意识到量词在字符类中是没有意义的。还注意到一个同事的电子邮件经常在数字前面和美元符号后面有一个空格,所以下面是一些更好的正则表达式(对于美元amts):

RegExp.Pattern = "\$\s*([\,\d]*(?:\.\d{2})?)"

字符串
从中得到一些启发:What does a hyperlink range.start and range.end refer to?得出了这个结论:

Sub trueUpAttempt()
Dim OrigLength As Long
Debug.Print ActiveDocument.Characters.Count

Dim SelStart As Long
Dim SelEnd As Long
Dim SelLength As Long

Dim rHyperlink As Range
Dim wdHyperlink As Hyperlink
    For Each wdHyperlink In ActiveDocument.Hyperlinks
        Set rHyperlink = wdHyperlink.Range
        'Debug.Print rHyperlink.Start
        'Debug.Print rHyperlink.End
        'Debug.Print rHyperlink.End - rHyperlink.Start
        Debug.Print rHyperlink.End - rHyperlink.Start - Len(rHyperlink)
        'there's got to be some way to true up the character offset, even if its ugly
        Debug.Print ActiveDocument.Characters.Count + rHyperlink.End - rHyperlink.Start - Len(rHyperlink)
    Next
End Sub


这不是一个修复,但我认为是一个大纲,以协调字符偏移。这完全是因为word计算了所有62个字符,例如{HYPERLINK "http://www.smithany.com"} http://www.smithany.com

编辑2023年7月22日尝试Tobais的建议:

Sub DollarHighlighter2()
Set regExp = New regExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim offsetEnd As Long
offsetEnd = Selection.End
regExp.Pattern = "\$([\,\d{1,3}]*(?:\.\d{2})?)"
regExp.Global = True
Set allMatches = regExp.Execute(Selection.text)   ' Execute search.
For i = allMatches.Count - 1 To 0 Step -1
    'MsgBox allMatches.Item(i)
    ActiveDocument.Range(offsetEnd - allMatches.Item(i).FirstIndex, End:=offsetEnd - allMatches.Item(i).FirstIndex + allMatches.Item(i).Length).FormattedText.HighlightColorIndex = wdYellow
Next
End Sub


但这似乎仍然有一个类似的问题与链接,也许其他内容。我也尝试了同样的范围确定正向,但在反向循环匹配,并有类似的问题。
示例文件的工作链接(无ssl):http://www.smithany.com/exampleDollarHighliter.docx

**原文:**我看过其他几篇StackOverflow的文章,比如这篇:How to Use/Enable (RegExp object) Regular Expression using VBA (MACRO) in word在Microsoft Word中使用正则表达式与VBA使用Microsoft VB脚本正则表达式5.5参考。

这帮助我准备了以下内容,我在Word中使用这些内容来突出显示美元金额:

Sub dollarHighlighter()
Set regExp = New regExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim offsetStart As Long
offsetStart = Selection.Start
regExp.Pattern = "\$([\,\d{1,3}]*(?:\.\d{2})?)"
regExp.Global = True
Set colMatches = regExp.Execute(Selection.Text)   ' Execute search.
For Each objMatch In colMatches   ' Iterate Matches collection.
  Set myRange = ActiveDocument.Range(objMatch.FirstIndex + offsetStart, 
    End:=offsetStart + objMatch.FirstIndex + objMatch.Length)
  myRange.FormattedText.HighlightColorIndex = wdYellow
Next
   End Sub


虽然这在文本中的美元金额列表上按预期工作(在大多数情况下- * 在其不完善之处,正则表达式故意有点松散 *),但当Word文档中存在超链接时,它不会按预期工作。
在这种情况下,突出显示的字符的偏移量似乎以某种不可预测的方式发生了偏移。我假设这是因为在document.xml源文件中有很多新的xml/css。
最后,我的首要问题是,我可以使用正则表达式来突出显示word文档内容,即使它包含超链接吗?这是一个偏移量问题,还是我应该在压缩的xml上运行正则表达式,重新压缩并重新打开以获得更好的结果?当我在源代码上测试各种正则表达式变体时,我得到了预期的结果,但在格式化Word范围时却没有。
我在这里也问过:https://social.msdn.microsoft.com/Forums/en-US/3a95c5e4-9e0c-4da9-970f-e0bf801c3170/macro-for-a-regexp-search-replace?forum=isvvba&prof=required但意识到这是一个古老的职位...
下面是一些可能有用的链接:示例文档http://www.smithany.com/test.docx步骤1 http://www.smithany.com/wordusd1.jpg步骤2 http://www.smithany.com/wordhighlighterrun.jpg以及发生的情况http://www.smithany.com/whatactuallyhappens.jpg

**临时解决方法:**如下所示,如果不堆叠循环,Word的通配符查找速度很快。试试这个:

Sub Macro2()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.highlight = True
With Selection.Find
    .Text = "$[0-9,]{1,}"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchAllWordForms = False
    .MatchSoundsLike = False
    .MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.highlight = True
With Selection.Find
    .Text = "$[0-9,]{1,}.[0-9]{2,3}"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchAllWordForms = False
    .MatchSoundsLike = False
    .MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll


结束子
基本上就是把所有的金额都加亮了也就是说,像匹配各种日期格式这样的复杂表达式可能会变得混乱,但我认为一次一步地完成它们是完全可能的。

pobjuy32

pobjuy321#

我已经很多年没有接触VBA了,但我想这就像骑自行车一样。
无论如何,这里有一个潜艇,应该可以帮助你。它基于Cindy Meister的声音建议,并使用可选部分的匹配模式集合填补了Regex和Wildcard Find之间差距。
首先,通配符匹配:$[0-9,]{1,}$[0-9,]{1,}.[0-9]{2}
其实也没什么不同,不是吗?然而,为了考虑可选的分数部分,我必须使用两种模式。
这是程序

Sub WildcardsHighlightWords()
    Dim Word As Range
    Dim WildcardCollection(2) As String
    Dim Words As Variant
    WildcardCollection(0) = "$[0-9,]{1,}"
    WildcardCollection(1) = "$[0-9,]{1,}.[0-9]{2}"
    Options.DefaultHighlightColorIndex = wdYellow
    'Clear existing formatting and settings in Find
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    'Set highlight to replace setting.
    Selection.Find.Replacement.Highlight = True
    'Cycle through document and find wildcards patterns, highlight words when found
    For Each Word In ActiveDocument.Words
        For Each WildcardsPattern In WildcardCollection
            With Selection.Find
                .Text = WildcardsPattern
                .Replacement.Text = ""
                .Forward = True
                .Wrap = wdFindContinue
                .Format = True
                .MatchCase = False
                .MatchWholeWord = False
                .MatchWildcards = True
                .MatchSoundsLike = False
                .MatchAllWordForms = False
            End With
            Selection.Find.Execute Replace:=wdReplaceAll
        Next
    Next
End Sub

字符串
如果需要的话,应该很容易扩展或修改这种方法。
这highlithts美元数额所需的在我的结束:
x1c 0d1x的数据
注意:量词{n , m}中的分隔符在所有本地化中并不相同,例如在德语版本中是{n ; m}。

z9smfwbn

z9smfwbn2#

**更新26.07.2023:**如果你逐段浏览你的文档,你可以很容易地绕过所有这些问题。但是,这在您的情况下是有效的,因为正则表达式匹配保持在段落边界内!

考虑到这个限制,下面的vba代码将工作:

Sub DollarHighlighter4()
    
    '26.07.2023, works within tables
    Dim RegExp As RegExp
    Dim allMatches As MatchCollection
    Dim wdPar As Paragraph
    Dim rngPar, rngDoc, rngFormat As Range
    Dim i, intA, intB As Integer
    
    Set rngDoc = ActiveDocument.Range
    
    Set RegExp = New RegExp
    RegExp.Pattern = "\$([\,\d{1,3}]*(?:\.\d{2})?)"
    RegExp.Global = True

    For Each wdPar In rngDoc.Paragraphs
        
        Set rngPar = wdPar.Range
        ' Get all matches, within current paragraph
        Set allMatches = RegExp.Execute(rngPar)
        
        ' Highlight all matches, within current paragraph
        For i = allMatches.Count - 1 To 0 Step -1
            intA = allMatches.Item(i).FirstIndex
            intB = intA + allMatches.Item(i).Length
            Set rngPar = wdPar.Range ' Always reset range to whole content
            Set rngFormat = wdPar.Range 'current Paragraph.Range
            ' Adjust text-range to actual regex-match
            ' Character-address refers to current paragraph
            rngFormat.SetRange Start:=rngPar.Characters(intA + 1).Start, _
                End:=rngPar.Characters(intB).End
            ' Perform action to range
            rngFormat.FormattedText.HighlightColorIndex = wdYellow
        Next

    Next wdPar
    
    'Finish
    Set rngFormat = Nothing
    Set rngPar = Nothing
    Set rngDoc = Nothing
    Set RegExp = Nothing
    Set allMatches = Nothing
    
End Sub

字符串
@Allan:应该使用YourVariable.SetRange,这样可以根据字符位置定义范围。
这应该可以工作:

Sub DollarHighlighter3()
Set regExp = New regExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim offsetEnd As Long
Dim rngFormat As Range
Dim intA, intB As Integer
regExp.Pattern = "\$([\,\d{1,3}]*(?:\.\d{2})?)"
regExp.Global = True
Set allMatches = regExp.Execute(ActiveDocument.Content)   ' Execute search.
For i = allMatches.Count - 1 To 0 Step -1
    intA = allMatches.Item(i).FirstIndex
    intB = intA + allMatches.Item(i).Length
    Set rngFormat = ActiveDocument.Range
    rngFormat.SetRange Start:=ActiveDocument.Range.Characters(intA).End, _
        End:=ActiveDocument.Range.Characters(intB).End
    rngFormat.FormattedText.HighlightColorIndex = wdYellow
Next
End Sub


昨天(2017年7月20日),我遇到了同样的问题:基于正则表达式模式识别文本出现-并将其转换为超链接。
对我有用的是:逆向求解!
regex对象一旦为“SET”,就具有基于原始单词文本的静态索引值。通过插入超链接,文字变得更长。因此,要么在每个文本操作之后重新定义regex对象(问题:如果所插入的超链接本身将获得匹配...)。或者从头到尾解析文档。这可以通过倒计时循环来完成,从最后一个regex出现开始。

相关问题