regex 使用带有RE.VERBOSE的正则表达式提取带有可选组的文本

bd1hkmkf  于 2023-08-08  发布在  其他
关注(0)|答案(1)|浏览(78)

我正在用RE.VERBOSE从HTM文本中提取一些信息。因为HKDMOPrate并不总是出现在代码中,所以我将其作为一个可选组。然而,代码并没有像预期的那样工作。下面是我的代码:

def get_primerate_result(text):
    pattern="""
        (effect\sfrom\s)
        (?P<Date>[A-Za-z0-9\,\s]+)
        (\s\()
        (.*USD\sprime\srate)
        (.*to\s)
        (?P<USDrate>[0-9\.\%]+)
        (\sp\.a\.)
        ((.*HKD\sand\sMOP\sprime\srate)
        (.*to\s)
        (?P<HKDMOPrate>[0-9\.\%]+)
        (\sp\.a\.))?
    """

    dict_result=[i.groupdict() for i in re.finditer(pattern, text, re.VERBOSE)]
    return dict_result

字符串
以下是两个示例输入:

输入:
正文1:

'Dear Customers,\nWith the Federal Reserve System raising its federal funds rate by 0.25%, our bank is here to announce that with effect from July 28, 2023 (Friday), our USD prime rate will be increased from 8.25% p.a. to 8.50% p.a., our HKD and MOP prime rate will be increased from 6.00% p.a. to 6.125% p.a.\nBank of China Limited Macau Branch\nBank of China (Macau) Limited\nJuly 27, 2023\nPlease click to check:\n\nPrime Rate\n'

正文二:

'Dear Customers,\nWith the Federal Reserve System raising its federal funds rate by 0.25%, our bank is here to announce that with effect from March 24, 2023 (Friday), our USD prime rate will be increased from 7.75% p.a. to 8.00% p.a.\nBank of China Limited Macau Branch\nBank of China (Macau) Limited\nMarch 24, 2023\nPlease click to check\n\nPrime Rate\n'

以下是我想要的输出:

result 1: [{'Date': 'July 28, 2023', 'USDrate': '8.50%', 'HKDMOPrate': '6.125%'}]
result 2: [{'Date': 'March 24, 2023', 'USDrate': '8.00%', 'HKDMOPrate': None}]

实际产量

result 1: [{'Date': 'July 28, 2023', 'USDrate': '6.125%', 'HKDMOPrate': None}]
result 2: [{'Date': 'March 24, 2023', 'USDrate': '8.00%', 'HKDMOPrate': None}]

bvuwiixz

bvuwiixz1#

正如InSync所建议的那样,通过将.* before设置为lazy来解决这个问题:(.*?to\s)的值。

相关问题