regex 如何在Python3中限制正则表达式匹配时间

zhte4eai  于 2023-08-08  发布在  Python
关注(0)|答案(1)|浏览(100)

我在(Python3.10,windows10)中使用了正则匹配函数re.match(pattern, str),但是当正则表达式模式错误时,有时会发生灾难性回溯。结果,程序停留在re.match,无法继续。
因为我有很多正则表达式,我不能一个一个地改变它们。
我尝试过限制函数的执行时间,但是因为我是windows平台,所以所有的方法都不起作用。

  • Signal(仅适用于Unix)
  • 函数超时
  • 超时装饰器
  • 埃文莱特

我的测试函数如下,我在How to limit execution time of a function call?中尝试了答案,但不起作用:

class TimeoutException(Exception):
    def __init__(self, msg=''):
        self.msg = msg

@contextmanager
def time_limit(seconds, msg=''):
    timer = threading.Timer(seconds, lambda: _thread.interrupt_main())
    timer.start()
    try:
        yield
    except KeyboardInterrupt:
        raise TimeoutException("Timed out for operation {}".format(msg))
    finally:
        # if the action ends in specified time, timer is canceled
        timer.cancel()

def my_func():
    astr = "http://www.fapiao.com/dzfp-web/pdf/download?request=6e7JGm38jfjghVrv4ILd-kEn64HcUX4qL4a4qJ4-CHLmqVnenXC692m74H5oxkjgdsYazxcUmfcOH2fAfY1Vw__%5EDadIfJgiEf"
    pattern = "^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]:)(([A-Za-z0-9-~]+).)+([A-Za-z0-9-~\/])+$"
    reg = re.compile(pattern)
    result = reg.match(astr)
    return result

if __name__ == '__main__':
    try:
        my_func()
    except TimeoutException as e:
        print(e.msg)

字符串
那么有没有办法:

  • 出现**“灾难性回溯”**时停止re.match
  • 限制常规匹配的次数/时间,或在匹配时间过长时引发Exception
  • 或限制函数的执行时间
pvcm50d1

pvcm50d11#

我知道我可以启动一个子进程,如果它在一定的时间内没有完成,我可以终止它。“worker”函数的结果my_func现在必须通过 * 托管队列 * 示例传递:

def my_func(result_queue):
    import re

    astr = "http://www.fapiao.com/dzfp-web/pdf/download?request=6e7JGm38jfjghVrv4ILd-kEn64HcUX4qL4a4qJ4-CHLmqVnenXC692m74H5oxkjgdsYazxcUmfcOH2fAfY1Vw__%5EDadIfJgiEf"
    pattern = "^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]:)(([A-Za-z0-9-~]+).)+([A-Za-z0-9-~\/])+$"
    reg = re.compile(pattern)
    result = reg.match(astr)
    # Cannot pickle a match object, so we must send back this:
    result_queue.put(
        {
            'span': result.span(),
            'group0': result[0],
            'groups': result.groups()
        }
    )

if __name__ == '__main__':
    from multiprocessing import Process, Manager

    with Manager() as manager:
        result_queue = manager.Queue()
        p = Process(target=my_func, args=(result_queue,))
        p.start()
        p.join(1) # Allow up to 1 second for process to complete
        if p.exitcode is None:
            # The process has not completed. So kill the process:
            print('killing process')
            p.terminate()
        else:
            # The process has completed. So get the result:
            result = result_queue.get()
            print(result)
            p.join() # This should return immediately since the process has completed.

字符串

相关问题