regex 仅当第二个ip地址跟在特定文本之后时才匹配该地址

thigvfpy  于 2023-10-22  发布在  其他
关注(0)|答案(4)|浏览(112)

我有以下模式(每个示例都在不同的文件中):

example one
-----------

alpha:    192.168.50.0 - 192.168.50.24
delta:    192.168.50.100 - 192.168.50.124
other fields: more stuff
....

example two
-------------

gamma: 200.0.0.0 - 200.0.0.64
lamda: 200.0.0.124 - 200.0.0.255
other  fields: more stuff
....

我正在使用Python对这些文件进行编译,并试图找到一个只匹配'alpha''gamma'的一行程序,并且只匹配范围内的第二个ip。因此,在我们的示例中,它将是:192.168.50.24200.0.0.64
类似的东西,这将给给予我只有第二个ip:

(?<=alpha:\s)|(?<=gamma:\s).*
u5rb5r59

u5rb5r591#

每当处理IP地址时,我总是依靠优秀的ipaddress库,因为它很容易进行有效性检查(使用更简单的正则表达式并尝试一下),比较方法(网络中的ip?),支持IPv4和IPv6,了解特殊范围(global、link_local、multicast..)-此外,您可以考虑使用自定义类来管理这些范围
如果您最终想要 * 完整 * 范围,并且正在通过查找第二个范围来解决它,那么像这样的东西可能会更清楚地得到您想要的内容,并且对构建有用
(note可以在脚本中匹配if instance.label:

class IPRange:

    def __init__(self, line=None, *, label=None, start=None, end=None):
        if line and (start or end):
             raise TypeError("only a line or start+end are expected")
        # adjust regex if IPv6 is needed
        # or I'd suggest `.split(":", 1)` and `.split(" - ")` again 
        if line:  # pull start and end from line
            match = re.match(r"^([^:]+):\s*([\d\.]{7,15})\s-\s([\d\.]{7,15})$", line)
            if not match:
                raise ValueError(f"not a valid line: {line}")
            _label, start, end = match.groups()
            label = label or _label  # allow setting label
        self.label    = label  # allow None
        # ValueError if either is not an IP Address
        self.ip_start = ipaddress.ip_address(start)
        self.ip_end   = ipaddress.ip_address(end)
        if self.ip_start > self.ip_end:  # NOTE allows ==
            raise ValueError(f"start must be before end: cmp({self.ip_start} > {self.ip_end})")
        # opportunity to reject special/reserved addresses or ranges

    def __repr__(self):
        return f"{type(self).__name__}({self.label}, {self.ip_start}, {self.ip_end})"

    def __contains__(self, other):
        """ support checking `ip in IPRange` """
        other = ipaddress.ip_address(other)
        # opportunity to reject special/reserved addresses or ranges
        return self.ip_start <= other <= self.ip_end

示例使用

>>> r = IPRange("alpha:    192.168.50.0 - 192.168.50.24")
>>> "192.168.50.1" in r
True
>>> "192.168.50.25" in r
False
>>> print(IPRange(label="foo", start="127.0.0.3", end="127.0.0.5"))
IPRange(foo, 127.0.0.3, 127.0.0.5)
>>> print(IPRange(label="foo", start="127.0.0.3", end="127.0.0.2"))
[..]
ValueError: start must be before end: cmp(127.0.0.3 > 127.0.0.2)

咀嚼多条线

for line in data.splitlines():
    try:
       print(IPRange(line))
    except ValueError as ex:
        print(f"failed: {repr(ex)}")
failed: ValueError('not a valid line: example one')
failed: ValueError('not a valid line: -----------')
failed: ValueError('None does not appear to be an IPv4 or IPv6 address')
IPRange(alpha, 192.168.50.0, 192.168.50.24)
IPRange(delta, 192.168.50.100, 192.168.50.124)
failed: ValueError('not a valid line: other fields: more stuff')
failed: ValueError('not a valid line: ....')
failed: ValueError('None does not appear to be an IPv4 or IPv6 address')
failed: ValueError('not a valid line: example two')
failed: ValueError('not a valid line: -------------')
failed: ValueError('None does not appear to be an IPv4 or IPv6 address')
IPRange(gamma, 200.0.0.0, 200.0.0.64)
IPRange(lamda, 200.0.0.124, 200.0.0.255)
failed: ValueError('not a valid line: other  fields: more stuff')
failed: ValueError('not a valid line: ....')

还可以看看ipaddress.summarize_address_range(start, end),它可能是一个有用的迭代器(__iter__方法?)也是

ubbxdtey

ubbxdtey2#

在您尝试的模式中,.*仅适用于交替|的第二部分
但是,如果你只通过分组查找来解决交替的问题,使用.*仍然可以匹配该行的其余部分,而不仅仅是第二个ip。
由于在示例数据中似乎有不同数量的前导空格,您可以考虑对第二个IP使用捕获组,同时匹配第一个IP及其前面的空格。
这是一个缩短的模式,用于捕获示例数据中的部分。如果你想要一个更好的匹配ip的字符串,你可以看到How to Find or Validate an IP Address

\b(?:alpha|gamma):\s+\d[\d.]+\s+-\s+(\d[\d.]+)

模式匹配:

  • \b防止部分字匹配的字边界
  • (?:alpha|gamma):匹配alpha:gamma:
  • \s+\d[\d.]+匹配1+空格字符、数字和1+数字或点
  • \s+-\s+匹配1+个空格字符之间的连字符
  • (\d[\d.]+)捕获组1,匹配数字后跟1+数字或点

参见regex demo

hts6caw3

hts6caw33#

你可以把它转换成一本字典,没有太多的麻烦,直接指出你想要的任何东西。即使这是4行实际的字符串操作,它可能仍然比正则表达式更有效。

data = """
alpha:    192.168.50.0 - 192.168.50.24
delta:    192.168.50.100 - 192.168.50.124
other fields: more stuff
"""

out = {}
for line in data.strip().split('\n'):
    if not line: continue

    k,v = line.split(':')        #get key:value pair
    v   = v.strip().split(' - ') #attempt a list

    #if len>1 keep it a list else grab only var
    out[k.strip()] = v if len(v) > 1 else v[0]

#PROOF
print(__import__('json').dumps(out, indent=4))
"""
{
    "alpha": [
        "192.168.50.0",
        "192.168.50.24"
    ],
    "delta": [
        "192.168.50.100",
        "192.168.50.124"
    ],
    "other fields": "more stuff"
}
"""

这样做的一个好处是,你可以在你走的时候投值(如果这是你需要的)。这样,dict不仅拥有所有值,而且它们都是正确的类型。我在你发布的例子中没有看到任何可以从中受益的东西,但是也许你在other_fields部分有一堆布尔值,浮点数或整型数。

data = """
alpha:    192.168.50.0 - 192.168.50.24
delta:    192.168.50.100 - 192.168.50.124
other fields: more stuff
"""

import re

MTCFLOAT = re.compile(r'^-?\d*\.\d+$').match
MTCINT   = re.compile(r'^-?\d+$').match

#cast input to appropriate type
def cast(v:str|None):
    #None
    if v is None: return None
    
    #force to str and strip
    v = f'{v}'.strip()
    
    #bool
    if (vl := v.lower()) in ('true','false'):
        return vl == 'true'
        
    #str, int, float - typed as str if not a None, bool, int or float (ie. str by default)
    _i = bool(MTCINT(v))
    _f = bool(MTCFLOAT(v))
    v  = (str,int,float)[_f<<1|_i](v)
            
    return v

out = {}
for line in data.strip().split('\n'):
    if not line: continue
    k,v = line.split(':')
    v = [cast(_) for _ in v.split(' - ')]
    out[cast(k)] = v if len(v) > 1 else v[0]

然后,您可以将所有这些内容组合在一起,添加一些动态元素,最后得到一个小的whack-a-mole解析器,它可以处理所有这些具有分离键:值对的非格式

#utils.py

import re
from dataclasses import dataclass

MTCFLOAT = re.compile(r'^-?\d*\.\d+$').match
MTCINT   = re.compile(r'^-?\d+$').match

@dataclass
class Separators:
    pair  : str = ':'
    list  : str = ','
    entry : str = '\n'
    

class Data2Dict:
    def __init__(self, sep:Separators) -> None:
        self._sep = sep
            
    def cast(self, v:str|None) -> None|int|float|str|bool:
        if v is None: return None
        
        v = f'{v}'.strip()
        
        if (vl := v.lower()) in ('true','false'):
            return vl == 'true'
            
        _i = bool(MTCINT(v))
        _f = bool(MTCFLOAT(v))
        v  = (str,int,float)[_f<<1|_i](v)
                
        return v
        
    def parse(self, data:str, sep:Separators=None) -> dict:  
        sep = self._sep = (sep or self._sep) 
        
        out = {}
        for line in data.strip().split(sep.entry):
            if not line: continue

            k,v = line.split(sep.pair)
            v = [self.cast(_) for _ in v.split(sep.list)]

            out[self.cast(k)] = v if len(v) > 1 else v[0]   
        
        return out
from utils import *

data = """
alpha:    192.168.50.0 - 192.168.50.24
delta:    192.168.50.100 - 192.168.50.124
other fields: more stuff
"""

results = Data2Dict(Separators(list=' - ')).parse(data)
guz6ccqo

guz6ccqo4#

import re
from contextlib import ExitStack

pattern = re.compile(r'^(\w+)(.*?-)(.*$)')
choices = (('alpha', "gamma"))
file_list = ['f1.txt', 'f2.txt']

with ExitStack() as stack:
    files = [stack.enter_context(open(filename)) for filename in file_list]
    for f in files:
        for line in f:
            if line and line.split(":")[0] in choices:
                print(re.sub(pattern, r'\1 \3',line))

Alpha 192.168.50.24
gamma 200.0.0.64

相关问题