regex 在Python中使用正则表达式作为模板

xoshrz7s 于 2023-02-14 发布在 Python

关注(0)|答案(5)|浏览(100)

我想使用正则表达式模式作为模板，并想知道在Python（Python 3或更新版本）中是否有方便的方法来这样做。

import re

pattern = re.compile("/something/(?P<id>.*)")
pattern.populate(id=1) # that is what I'm looking for

应该导致

/something/1

regex

来源：https://stackoverflow.com/questions/3672244/using-a-regex-as-a-template-with-python

5条答案

按热度按时间

lf5gs5x21#

这不是正则表达式用途，您可以使用普通的字符串格式。

>>> '/something/{id}'.format(id=1)
'/something/1'

赞(0）回复(0）举报 2023-02-14

bvjveswy2#

下面是我创建的一个轻量级类，它可以做你所寻找的事情。你可以写一个正则表达式，并使用这个表达式来匹配字符串和生成字符串。
在代码的底部有一个关于如何使用它的小示例。
通常，您可以正常构造正则表达式，并正常使用match和search函数。format函数的使用与string.format非常相似，用于生成新字符串。

import re
regex_type = type(re.compile(""))

# This is not perfect. It breaks if there is a parenthesis in the regex.
re_term = re.compile(r"(?<!\\)\(\?P\<(?P<name>[\w_\d]+)\>(?P<regex>[^\)]*)\)")

class BadFormatException(Exception):
    pass

class RegexTemplate(object):
    def __init__(self, r, *args, **kwargs):
        self.r = re.compile(r, *args, **kwargs)
    
    def __repr__(self):
        return "<RegexTemplate '%s'>"%self.r.pattern
    
    def match(self, *args, **kwargs):
        '''The regex match function'''
        return self.r.match(*args, **kwargs)
    
    def search(self, *args, **kwargs):
        '''The regex match function'''
        return self.r.search(*args, **kwargs)
    
    def format(self, **kwargs):
        '''Format this regular expression in a similar way as string.format.
        Only supports true keyword replacement, not group replacement.'''
        pattern = self.r.pattern
        def replace(m):
            name = m.group('name')
            reg = m.group('regex')
            val = kwargs[name]
            if not re.match(reg, val):
                raise BadFormatException("Template variable '%s' has a value "
                    "of %s, does not match regex %s."%(name, val, reg))
            return val
        
        # The regex sub function does most of the work
        value = re_term.sub(replace, pattern)
        
        # Now we have un-escape the special characters. 
        return re.sub(r"\\([.\(\)\[\]])", r"\1", value)

def compile(*args, **kwargs):
    return RegexTemplate(*args, **kwargs)
    
if __name__ == '__main__':
    # Construct a typical URL routing regular expression
    r = RegexTemplate(r"http://example\.com/(?P<year>\d\d\d\d)/(?P<title>\w+)")
    print(r)
    
    # This should match
    print(r.match("http://example.com/2015/article"))
    # Generate the same URL using url formatting.
    print(r.format(year = "2015", title = "article"))
    
    # This should not match
    print(r.match("http://example.com/abcd/article"))
    # This will raise an exception because year is not formatted properly
    try:
        print(r.format(year = "15", title = "article"))
    except BadFormatException as e:
        print(e)

有一些限制：

format函数只能使用关键字参数（不能像string.format那样使用\1样式的格式）。
还有一个错误是匹配元素和子元素，例如RegexTemplate(r'(?P<foo>biz(baz)?)')。这个错误可以通过一些工作来纠正。
如果你的正则表达式包含命名组之外的字符类（例如[a-z123]），我们将不知道如何格式化它们。

赞(0）回复(0）举报 2023-02-14

ogsagwnx3#

将编译保存到替换之后：

pattern = re.compile("/something/(?P<%s>.*)" % 1)

赞(0）回复(0）举报 2023-02-14

azpvetkf4#

对于非常简单的情况，最简单的方法可能是用格式字段替换指定的捕获组。
下面是一个基本的验证器/格式化器：

import re
from functools import partial

unescape = partial(re.compile(r'\\(.)').sub, r'\1')
namedgroup = partial(re.compile(r'\(\?P<(\w+)>.*?\)').sub, r'{\1}')

class Mould:
    def __init__(self, pattern):
        self.pattern = re.compile(pattern)
        self.template = unescape(namedgroup(pattern))

    def format(self, **values):
        try:
            return self.template.format(**values)
        except KeyError as e:
            raise TypeError(f'Missing argument: {e}') from None

    def search(self, string):
        try:
            return self.pattern.search(string).groupdict()
        except AttributeError:
            raise ValueError(string) from None

例如，要示例化(XXX) YYY-ZZZZ的电话号码的验证器/格式化器：

template = r'\((?P<area>\d{3})\)\ (?P<prefix>\d{3})\-(?P<line>\d{4})'
phonenum = Mould(template)

然后：

>>> phonenum.search('(333) 444-5678')
{'area': '333', 'prefix': '444', 'line': '5678'}

>>> phonenum.format(area=111, prefix=555, line=444)
(111) 555-444

但这是一个非常基本的框架，忽略了许多regex特性（例如，像lookarounds或非捕获组）。如果需要它们，事情会很快变得非常混乱。在这种情况下，情况正好相反：从模板生成模式虽然更冗长，但是可以更灵活并且更不容易出错。
下面是基本的验证器/格式化器（.search()和.format()相同）：

import string
import re

FMT = string.Formatter()

class Mould:
    def __init__(self, template, **kwargs):
        self.template = template
        self.pattern = self.make_pattern(template, **kwargs)

    @staticmethod
    def make_pattern(template, **kwargs):
        pattern = ''
        # for each field in the template, add to the pattern
        for text, field, *_ in FMT.parse(template):
            # the escaped preceding text
            pattern += re.escape(text)
            if field:
                # a named regex capture group
                pattern += f'(?P<{field}>{kwargs[field]})'
            # XXX: if there's text after the last field,
            #   the parser will iterate one more time,
            #   hence the 'if field'
        return re.compile(pattern)

示例化：

template = '({area}) {prefix}-{line}'
content  = dict(area=r'\d{3}', prefix=r'\d{3}', line=r'\d{4}')
phonenum = Mould(template, **content)

执行：

>>> phonenum.search('(333) 444-5678')
{'area': '333', 'prefix': '444', 'line': '5678'}

>>> phonenum.format(area=111, prefix=555, line=444)
(111) 555-444

赞(0）回复(0）举报 2023-02-14

m528fe3b5#

如果正则表达式只是由一些预定义字符串连接起来的一组命名组，则可以将其转换为如下所示的模板字符串

from string import Template
def pattern2template(regex, join_string):
    tmpl_str = join_string.join(["$"+x for x in regex.groupindex.keys()])
    # prepend string to match your case
    tmpl_str = join_string + tmpl_str
    return Template(tmpl_str)

在您的情况下，这给出：

>>> x = pattern2template(pattern, "/something/")
>>> print(x.template)
/something/$id
>>> print(x.substitute(id="myid"))
/something/myid

赞(0）回复(0）举报 2023-02-14

我来回答

regex 在Python中使用正则表达式作为模板

5条答案

相关问题

热门标签

最新问答