regex 确定化学式中的原子总数

bvuwiixz  于 2023-05-30  发布在  其他
关注(0)|答案(2)|浏览(185)

我有一张上千个化学式的列表,其中可能包含任何元素的符号。我想确定每个分子式中任何元素的原子总数。例子包括:

  • 硝酸甲烷
  • selenium 化铯
  • C2Cl2
  • 二氯二氧乙烷
  • C2Cl3F
  • 溴化三氟乙烷
  • C2H2Br2
  • C2H3Cl3Si

我想知道一个分子式中的原子总数,所以对于第一个例子(CH3NO3),答案是8(1个碳+3个氢+1个氮+3个氧)。
我找到了PEH(Extract numbers from chemical formula)的代码,它使用正则表达式来提取化学公式中特定元素的示例数。
这能被修改以给予总原子数吗?

Public Function ChemRegex(ChemFormula As String, Element As String) As Long
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With
    
    'first pattern matches every element once
    regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
    
    Dim Matches As MatchCollection
    Set Matches = regEx.Execute(ChemFormula)
    
    Dim m As Match
    For Each m In Matches
        If m.SubMatches(0) = Element Then
            ChemRegex = ChemRegex + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
        End If
    Next m
    
    'second patternd finds parenthesis and multiplies elements within
    regEx.Pattern = "(\((.+?)\)([0-9])+)+?"
    Set Matches = regEx.Execute(ChemFormula)
    For Each m In Matches
        ChemRegex = ChemRegex + ChemRegex(m.SubMatches(1), Element) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
    Next m
End Function
bq9c1y66

bq9c1y661#

你可以通过循环遍历所有字符来做到这一点。计数所有大写字符并将所有数字加1减1。这是元素的总计数。

Option Explicit

Public Function ChemCountTotalElements(ByVal ChemFormula As String) As Long
    Dim RetVal As Long

    Dim c As Long
    For c = 1 To Len(ChemFormula)
        Dim Char As String
        Char = Mid$(ChemFormula, c, 1)
        
        If IsNumeric(Char) Then
            RetVal = RetVal + CLng(Char) - 1
        ElseIf Char = UCase(Char) Then
            RetVal = RetVal + 1
        End If
        
    Next c
    
    ChemCountTotalElements = RetVal
End Function

请注意,这并不处理括号!并且它不检查元素是否实际存在。因此XYZ2将被计为4
也只能处理10以下的数字。如果你有10及以上的数字,请使用下面的RegEx解决方案(它可以处理)。

还可以识别带有前题的化学式,如Ca(OH)2

如果你需要一个更精确的方法(检查元素的存在)和识别括号,你需要再次使用RegEx。
由于VBA不支持开箱即用的正则表达式,因此我们需要首先引用Windows库。
1.在 Tools 下添加regex的引用,然后添加 References

1.然后选择 Microsoft VBScript Regular Expression 5.5

1.将此函数添加到模块

Public Function ChemRegexCountTotalElements(ByVal ChemFormula As String) As Long
    Dim RetVal As Long

    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With

    'first pattern matches every element once
    regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"

    Dim Matches As MatchCollection
    Set Matches = regEx.Execute(ChemFormula)

    Dim m As Match
    For Each m In Matches
        RetVal = RetVal + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
    Next m

    'second patternd finds parenthesis and multiplies elements within
    regEx.Pattern = "(\((.+?)\)([0-9]+)+)+?"
    Set Matches = regEx.Execute(ChemFormula)
    For Each m In Matches
        RetVal = RetVal + ChemRegexCountTotalElements(m.SubMatches(1)) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
    Next m

    ChemRegexCountTotalElements = RetVal
End Function

虽然这段代码也将识别括号,但请注意,它不识别嵌套的括号。

kmb7vmvb

kmb7vmvb2#

这是我的两分钱

C1中的公式:

=ChemRegex(A1)

其中ChemRegex()调用:

Public Function ChemRegex(ChemFormula As String) As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "[A-Z][a-z]*(\d*)"
    If .Test(ChemFormula) Then
        Set matches = .Execute(ChemFormula)
        For Each Match In matches
            ChemRegex = ChemRegex + IIf(Match.Submatches(0) = "", 1, Match.Submatches(0))
        Next
    Else
        ChemRegex = 0
    End If
End With

End Function

或者在(较短的)2步regex解决方案中:

Public Function ChemRegex(ChemFormula As String) As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "([A-Za-z])(?=[A-Z]|$)"
    ChemFormula = .Replace(ChemFormula, "$1-1")
    .Pattern = "\D+"
    ChemFormula = .Replace(ChemFormula, "+")
    ChemRegex = Evaluate(ChemFormula)
End With

End Function

相关问题