使用Regex/VBA获取每个以点结尾的单词

pieyvz9o 于 2022-11-18 发布在其他

关注(0)|答案(3)|浏览(137)

我正在使用Excel 2019，我试图从一堆混乱的文本单元格中提取任何（最多5个）以点结尾的单词，这些单词在]之后。
这是我试图解析/清理的文本的示例

some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan.

我期望得到这个：ost. ult. lot. sino. collan.我正在使用这个功能找到在互联网上的某个地方似乎做的工作：

Public Function RegExtract(Txt As String, Pattern As String) As String

With CreateObject("vbscript.regexp")
    '.Global = True
    .Pattern = Pattern
    If .test(Txt) Then
        RegExtract = .Execute(Txt)(0)
    Else
        RegExtract = "No match found"
    End If
End With

End Function

我在一个空的单元格中调用它：=RegExtract(D2; "([]])(\s\w+[.]){0,5}")这是我第一次使用regexp，所以我可能做了一些在Maven看来很糟糕的事情。所以这是我的表达：（[]]）（\s\w+[.]）{0，5} 现在它只返回] ost.`
这比我在第一次使用regex时所期望的要多得多，但是：
1.我不能去掉第一个]，它需要在文本块中找到我的有用位开始的位置，因为\K在Excel中不起作用。我可能会在以后作为一个聪明的野蛮人“找到并替换”它，但我想知道如何干净地完成它，如果有任何干净的方法存在的话：）
2）我不明白迭代器是如何工作的，以获得所有我的“最多5次出现”：我希望第二组后的{0，5}的确切含义是：“再次重复前一组，直到文本块的结尾（或者直到你设法做了5次）"。
感谢您抽出宝贵时间：）
--在JdvD接受回答后添加，用于记录--
我使用这个模式来获取所有以点结尾的单词，在第一次出现右括号之后。

^.*?\]|(\w+\.\s?)|.

这个（不带问号）会得到所有以点结尾的单词，在最后一个出现的右括号之后。

^.*\]|(\w+\.\s?)|.

我甚至在regExtract函数中遗漏了一些东西：我需要通过for循环将匹配存储到一个数组中，然后将该数组作为字符串输出。我错误地认为regex引擎已经将匹配存储为唯一的字符串。
提取每个匹配项的正确RegExtract函数如下：

Public Function RegExtract(Txt As String, Pattern As String) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    If .Test(Txt) Then
        For Each rMatch In .Execute(Txt)
            If Not IsEmpty(rMatch.SubMatches(0)) Then
                ReDim Preserve arrayMatches(i)
                arrayMatches(i) = rMatch.SubMatches(0)
                i = i + 1
            End If
        Next
        RegExtract = Join(arrayMatches, " ")
    Else
        RegExtract = "No match found"
    End If
End With

End Function

regex

来源：https://stackoverflow.com/questions/74382754/get-every-word-ending-with-dot-using-regex-vba

3条答案

按热度按时间

8yparm6h1#

注册表匹配：

除了@RonRosenfeld给出的答案之外，还可以应用'The Best Regex Trick Ever'，这意味着首先匹配捕获组中您不想要的内容，然后匹配您想要的内容。例如：

^.*\]|(\w+\.)

请参见在线demo，简而言之，这意味着：

^.*\]-匹配从字符串开头到最后出现的右方括号之间的0+（贪婪）个字符;
|-或;
(\w+\.)-捕获组包含1+（贪婪）以点结尾的单词字符。

以下是它在UDF中的工作方式：

Sub Test()

Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "

Debug.Print RegExtract(s, "^.*\]|(\w+\.)")

End Sub

'------

'The above Sub would invoke the below function as an example.
'But you could also invoke this through: `=RegExtract(A1,"^.*\]|(\w+\.)")`
'on your sheet.

'------

Public Function RegExtract(Txt As String, Pattern As String) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    If .Test(Txt) Then
        For Each rMatch In .Execute(Txt)
            If Not IsEmpty(rMatch.SubMatches(0)) Then
                ReDim Preserve arrayMatches(i)
                arrayMatches(i) = rMatch.SubMatches(0)
                i = i + 1
            End If
        Next
        RegExtract = Join(arrayMatches, " ")
    Else
        RegExtract = "No match found"
    End If
End With

End Function

正则表达式替换：

根据你想要的输出，你也可以使用替换函数。你必须用另一个替代字符来匹配剩下的字符。例如：

^.*\]|(\w+\.\s?)|.

简单地说，这意味着我们添加了另一个选择，它是简单的任何单个字符。第二个小的补充是，我们在第二个选择中添加了一个 optional 空格字符\s?。

Sub Test()

Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "

Debug.Print RegReplace(s, "^.*\]|(\w+\.\s?)|.", "$1")

End Sub

'------

'There are now 3 parameters to parse to the UDF; String, Pattern and Replacement.

'------

Public Function RegReplace(Txt As String, Pattern As String, Replacement) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    RegReplace = Trim(.Replace(Txt, Replacement))
End With

End Function

请注意，我使用了Trim()来删除可能的尾随空格。
RegexMatch和RegexReplace当前都返回一个字符串来清除输入，但前者确实给予了处理arrayMatches（）变量中数组的选项。

赞(0）回复(0）举报 2022-11-18

bz4sfanl2#

有一个方法可以返回从某个模式开始的字符串中的所有匹配项。但是我现在想不起来了。
同时，最简单的方法似乎是删除第一个]之前的所有内容，然后对剩余部分应用Regex。
例如：

Option Explicit
Sub findit()
  Const str As String = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan."
  Dim RE As RegExp, MC As MatchCollection, M As Match
  Dim S As String
  Dim sOutput As String
  
S = Mid(str, InStr(str, "]"))

Set RE = New RegExp
With RE
    .Pattern = "\w+(?=\.)"
    .Global = True
    If .Test(S) = True Then
        Set MC = .Execute(S)
        For Each M In MC
            sOutput = sOutput & vbLf & M
        Next M
    End If
End With

MsgBox Mid(sOutput, 2)

End Sub

您当然可以使用计数器而不是For each循环将匹配数限制为5 *

赞(0）回复(0）举报 2022-11-18

x6h2sr283#

可以使用以下正则表达式

([a-zA-Z]+)\.

中的每一个
让我解释一下。
[a-zA-Z] - 查找包含从 a 到 z 和从 A 到 Z 的任意字母的所有内容，但只匹配第一个字母。
\+ - 使用此命令，您可以告诉匹配所有字母，直到它找到不是从 a 到 z 和从 A 到 Z 的字母的内容
\. - 有了这个，你就可以在比赛结束时找到。
这里的 example 。

赞(0）回复(0）举报 2022-11-18

我来回答

使用Regex/VBA获取每个以点结尾的单词

3条答案

相关问题

热门标签

最新问答